Mostra el registre d'ítem simple

dc.contributor.authorRamón Cortés, Cristian
dc.contributor.authorLordan Gomis, Francesc
dc.contributor.authorEjarque Artigas, Jorge
dc.contributor.authorBadia Sala, Rosa Maria
dc.contributor.otherUniversitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.otherBarcelona Supercomputing Center
dc.date.accessioned2020-09-17T10:29:37Z
dc.date.available2022-07-09T00:28:46Z
dc.date.issued2020-12
dc.identifier.citationRamón-Cortés, C. [et al.]. A programming model for hybrid workflows: combining task-based workflows and dataflows all-in-one. "Future generation computer systems", Desembre 2020, vol. 113, p. 281-297.
dc.identifier.issn0167-739X
dc.identifier.otherhttps://arxiv.org/abs/2007.04939
dc.identifier.urihttp://hdl.handle.net/2117/328850
dc.description.abstractIn the past years, e-Science applications have evolved from large-scale simulations executed in a single cluster to more complex workflows where these simulations are combined with High-Performance Data Analytics (HPDA). To implement these workflows, developers are currently using different patterns; mainly task-based and dataflow. However, since these patterns are usually managed by separated frameworks, the implementation of these applications requires to combine them; considerably increasing the effort for learning, deploying, and integrating applications in the different frameworks. This paper tries to reduce this effort by proposing a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows from now on) using a single programming model. Hence, developers can build complex Data Science workflows with different approaches depending on the requirements. To illustrate the capabilities of Hybrid Workflows, we have built a Distributed Stream Library and a fully functional prototype extending COMPSs, a mature, general-purpose, task-based, parallel programming model. The library can be easily integrated with existing task-based frameworks to provide support for dataflows. Also, it provides a homogeneous, generic, and simple representation of object and file streams in both Java and Python; enabling complex workflows to handle any data type without dealing directly with the streaming back-end. During the evaluation, we introduce four use cases to illustrate the new capabilities of Hybrid Workflows; measuring the performance benefits when processing data continuously as it is generated, when removing synchronisation points, when processing external real-time data, and when combining task-based workflows and dataflows at different levels. The users identifying these patterns in their workflows may use the presented uses cases (and their performance improvements) as a reference to update their code and benefit of the capabilities of Hybrid Workflows. Furthermore, we analyse the scalability in terms of the number of writers and readers and measure the task analysis, task scheduling, and task execution times when using objects or streams.
dc.description.sponsorshipThis work has been supported by the Spanish Government (contracts SEV2015-0493 and TIN2015-65316-P), by Generalitat de Catalunya (contract 2014-SGR-1051), and by the European Commission through the Horizon 2020 Research and Innovation program under contract 730929 (MF2C project). Cristian Ramon-Cortes predoctoral contract is financed by the Spanish Government under the contract BES-2016-076791.
dc.format.extent17 p.
dc.language.isoeng
dc.publisherElsevier
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights©2020 Elsevier
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshParallel programming (Computer science)
dc.subject.lcshBig data
dc.subject.lcshElectronic data processing -- Distributed processing
dc.subject.otherTask-based workflows
dc.subject.otherDataflows
dc.subject.otherStreaming
dc.subject.otherConvergence HPC-Big Data
dc.subject.otherDistributed computing
dc.subject.otherProgramming models
dc.titleA programming model for hybrid workflows: combining task-based workflows and dataflows all-in-one
dc.typeArticle
dc.subject.lemacProgramació en paral·lel (Informàtica)
dc.subject.lemacMacrodades
dc.subject.lemacProcessament distribuït de dades
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1016/j.future.2020.07.007
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://doi.org/10.1016/j.future.2020.07.007
dc.rights.accessOpen Access
local.identifier.drac29009979
dc.description.versionPostprint (author's final draft)
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/H2020/730929/EU/Towards an Open, Secure, Decentralized and Coordinated Fog-to-Cloud Management Ecosystem/mF2C
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
dc.relation.projectidinfo:eu-repo/grantAgreement/AGAUR/V PRI/2014 SGR 1051
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/1PE/BES-2016-076791
local.citation.authorRamón-Cortés, C.; Lordan, F.; Ejarque, J.; Badia, R.M.
local.citation.publicationNameFuture generation computer systems
local.citation.volume113
local.citation.startingPage281
local.citation.endingPage297


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple