Mostra el registre d'ítem simple

dc.contributor.authorMammadli, Nihad
dc.contributor.authorEjarque Artigas, Jorge
dc.contributor.authorÁlvarez Cid-Fuentes, Javier
dc.contributor.authorBadia Sala, Rosa Maria
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.otherBarcelona Supercomputing Center
dc.date.accessioned2023-01-10T10:18:32Z
dc.date.available2023-01-10T10:18:32Z
dc.date.issued2022-05-25
dc.identifier.citationMammadli, N. [et al.]. DDS: integrating data analytics transformations in task-based workflows [version 1; peer review: 1 approved, 2 approved with reservations]. "Open Research Europe", 25 Maig 2022, vol. 2, article 66, p. 1-16.
dc.identifier.issn2732-5121
dc.identifier.urihttp://hdl.handle.net/2117/379643
dc.description.abstractHigh-performance data analytics (HPDA) is a current trend in e-science research that aims to integrate traditional HPC with recent data analytic frameworks. Most of the work done in this field has focused on improving data analytic frameworks by implementing their engines on top of HPC technologies such as Message Passing Interface. However, there is a lack of integration from an application development perspective. HPC workflows have their own parallel programming models, while data analytic (DA) algorithms are mainly implemented using data transformations and executed with frameworks like Spark. Task-based programming models (TBPMs) are a very efficient approach for implementing HPC workflows. Data analytic transformations can also be decomposed as a set of tasks and implemented with a task-based programming model. In this paper, we present a methodology to develop HPDA applications on top of TBPMs that allow developers to combine HPC workflows and data analytic transformations seamlessly. A prototype of this approach has been implemented on top of the PyCOMPSs task- based programming model to validate two aspects: HPDA applications can be seamlessly developed and have better performance than Spark. We compare our results using different programs. Finally, we conclude with the idea of integrating DA into HPC applications and evaluation of our method against Spark.
dc.description.sponsorshipThis research was financially supported by the European Union’s Horizon 2020 research and innovation programme under the grant agreement No 780622; and the Spanish Government (PID2019-107255GB), Generalitat de Catalunya (2014-SGR-1051).
dc.format.extent16 p.
dc.language.isoeng
dc.rightsAttribution 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subjectÀrees temàtiques de la UPC::Informàtica::Programació
dc.subject.lcshBig data
dc.subject.lcshHigh performance computing
dc.subject.lcshParallel programming (Computer science)
dc.subject.otherBig data high performance
dc.subject.otherData analytics
dc.subject.otherParallel computing
dc.subject.otherTask based programming models
dc.titleDDS: integrating data analytics transformations in task-based workflows [version 1; peer review: 1 approved, 2 approved with reservations]
dc.typeArticle
dc.subject.lemacDades massives
dc.subject.lemacCàlcul intensiu (Informàtica)
dc.subject.lemacProgramació en paral·lel (Informàtica)
dc.identifier.doi10.12688/openreseurope.14569.1
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://open-research-europe.ec.europa.eu/articles/2-66/v1#referee-response-29377
dc.rights.accessOpen Access
local.identifier.drac35033561
dc.description.versionPostprint (published version)
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/H2020/780622/EU/Edge and CLoud Computation: A Highly Distributed Software Architecture for Big Data AnalyticS/CLASS
dc.relation.projectidinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107255GB-C22/ES/UPC-COMPUTACION DE ALTAS PRESTACIONES VIII/
dc.relation.projectidinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107255GB-C21/ES/BSC - COMPUTACION DE ALTAS PRESTACIONES VIII/
local.citation.authorMammadli, N.; Ejarque, J.; Álvarez, J.; Badia, R.M.
local.citation.publicationNameOpen Research Europe
local.citation.volume2
local.citation.numberarticle 66
local.citation.startingPage1
local.citation.endingPage16


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple