Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

Banner header
59.687 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Centres de recerca
  • BSC - Barcelona Supercomputing Center
  • Computer Sciences
  • Articles de revista
  • View Item
  •   DSpace Home
  • E-prints
  • Centres de recerca
  • BSC - Barcelona Supercomputing Center
  • Computer Sciences
  • Articles de revista
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

A programming model for hybrid workflows: combining task-based workflows and dataflows all-in-one

Thumbnail
View/Open
2007.04939.pdf (1,584Mb)
Share:
 
 
10.1016/j.future.2020.07.007
 
  View Usage Statistics
Cita com:
hdl:2117/328850

Show full item record
Ramón Cortés, CristianMés informació
Lordan Gomis, FrancescMés informació
Ejarque Artigas, Jorge
Badia Sala, Rosa MariaMés informacióMés informacióMés informació
Document typeArticle
Defense date2020-12
PublisherElsevier
Rights accessOpen Access
Attribution-NonCommercial-NoDerivs 4.0 International
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution-NonCommercial-NoDerivs 4.0 International
ProjectmF2C - Towards an Open, Secure, Decentralized and Coordinated Fog-to-Cloud Management Ecosystem (EC-H2020-730929)
COMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
Abstract
In the past years, e-Science applications have evolved from large-scale simulations executed in a single cluster to more complex workflows where these simulations are combined with High-Performance Data Analytics (HPDA). To implement these workflows, developers are currently using different patterns; mainly task-based and dataflow. However, since these patterns are usually managed by separated frameworks, the implementation of these applications requires to combine them; considerably increasing the effort for learning, deploying, and integrating applications in the different frameworks. This paper tries to reduce this effort by proposing a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows from now on) using a single programming model. Hence, developers can build complex Data Science workflows with different approaches depending on the requirements. To illustrate the capabilities of Hybrid Workflows, we have built a Distributed Stream Library and a fully functional prototype extending COMPSs, a mature, general-purpose, task-based, parallel programming model. The library can be easily integrated with existing task-based frameworks to provide support for dataflows. Also, it provides a homogeneous, generic, and simple representation of object and file streams in both Java and Python; enabling complex workflows to handle any data type without dealing directly with the streaming back-end. During the evaluation, we introduce four use cases to illustrate the new capabilities of Hybrid Workflows; measuring the performance benefits when processing data continuously as it is generated, when removing synchronisation points, when processing external real-time data, and when combining task-based workflows and dataflows at different levels. The users identifying these patterns in their workflows may use the presented uses cases (and their performance improvements) as a reference to update their code and benefit of the capabilities of Hybrid Workflows. Furthermore, we analyse the scalability in terms of the number of writers and readers and measure the task analysis, task scheduling, and task execution times when using objects or streams.
CitationRamón-Cortés, C. [et al.]. A programming model for hybrid workflows: combining task-based workflows and dataflows all-in-one. "Future generation computer systems", Desembre 2020, vol. 113, p. 281-297. 
URIhttp://hdl.handle.net/2117/328850
DOI10.1016/j.future.2020.07.007
ISSN0167-739X
Publisher versionhttps://doi.org/10.1016/j.future.2020.07.007
Other identifiershttps://arxiv.org/abs/2007.04939
Collections
  • Computer Sciences - Articles de revista [277]
  • Departament d'Arquitectura de Computadors - Articles de revista [967]
  • CAP - Grup de Computació d'Altes Prestacions - Articles de revista [380]
  • Doctorat en Arquitectura de Computadors - Articles de revista [140]
Share:
 
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
2007.04939.pdf1,584MbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Privacy Settings
  • Inici de la pàgina