Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

57.066 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Programes de doctorat
  • Doctorat en Arquitectura de Computadors
  • Ponències/Comunicacions de congressos
  • View Item
  •   DSpace Home
  • E-prints
  • Programes de doctorat
  • Doctorat en Arquitectura de Computadors
  • Ponències/Comunicacions de congressos
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Scanflow-K8s: agent-based framework for autonomic management and supervision of ML workflows in Kubernetes clusters

Thumbnail
View/Open
Scanflow-K8s Agent-based Framework for Autonomic Management and Supervision of ML Workflows in Kubernetes Clusters(cameraready).pdf (1,109Mb)
Share:
 
 
10.1109/CCGrid54584.2022.00047
 
  View Usage Statistics
Cita com:
hdl:2117/371292

Show full item record
Liu, PeiniMés informacióMés informació
Bravo Rocca, Gusseppe
Guitart Fernández, JordiMés informacióMés informacióMés informació
Dholakia, Ajay
Ellison, David
Hodak, Miroslav
Document typeConference report
Defense date2022
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder
ProjectUPC-COMPUTACION DE ALTAS PRESTACIONES VIII (AEI-PID2019-107255GB-C22)
Abstract
Machine Learning (ML) projects are currently heavily based on workflows composed of some reproducible steps and executed as containerized pipelines to build or deploy ML models efficiently because of the flexibility, portability, and fast delivery they provide to the ML life-cycle. However, deployed models need to be watched and constantly managed, supervised, and debugged to guarantee their availability, validity, and robustness in unexpected situations. Therefore, containerized ML workflows would benefit from leveraging flexible and diverse autonomic capabilities. This work presents an architecture for autonomic ML workflows with abilities for multi-layered control, based on an agent-based approach that enables autonomic management and supervision of ML workflows at the application layer and the infrastructure layer (by collaborating with the orchestrator). We redesign the Scanflow ML framework to support such multi-agent approach by using triggers, primitives, and strategies. We also implement a practical platform, so-called Scanflow-K8s, that enables autonomic ML workflows on Kubernetes clusters based on the Scanflow agents. MNIST image classification and MLPerf ImageNet classification benchmarks are used as case studies to show the capabilities of Scanflow-K8s under different scenarios. The experimental results demonstrate the feasibility and effectiveness of our proposed agent approach and the Scanflow-K8s platform for the autonomic management of ML workflows in Kubernetes clusters at multiple layers.
CitationLiu, P. [et al.]. Scanflow-K8s: agent-based framework for autonomic management and supervision of ML workflows in Kubernetes clusters. A: IEEE/ACM International Symposium on Cluster Computing and the Grid. "22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing: CCGrid 2022: proceedings: 1619 May 2022 Taormina (Messina), Italy". Institute of Electrical and Electronics Engineers (IEEE), 2022, p. 376-385. ISBN 978-1-6654-9956-9. DOI 10.1109/CCGrid54584.2022.00047. 
URIhttp://hdl.handle.net/2117/371292
DOI10.1109/CCGrid54584.2022.00047
ISBN978-1-6654-9956-9
Publisher versionhttps://ieeexplore.ieee.org/abstract/document/9826110
Collections
  • Doctorat en Arquitectura de Computadors - Ponències/Comunicacions de congressos [196]
  • Computer Sciences - Ponències/Comunicacions de congressos [459]
  • CAP - Grup de Computació d'Altes Prestacions - Ponències/Comunicacions de congressos [762]
  • Departament d'Arquitectura de Computadors - Ponències/Comunicacions de congressos [1.773]
Share:
 
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
Scanflow-K8s Ag ... Clusters(cameraready).pdf1,109MbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Inici de la pàgina