Ir al contenido (pulsa Retorno)

Universitat Politècnica de Catalunya

    • Català
    • Castellano
    • English
    • LoginRegisterLog in (no UPC users)
  • mailContact Us
  • world English 
    • Català
    • Castellano
    • English
  • userLogin   
      LoginRegisterLog in (no UPC users)

UPCommons. Global access to UPC knowledge

57.064 UPC E-Prints
You are here:
View Item 
  •   DSpace Home
  • E-prints
  • Centres de recerca
  • BSC - Barcelona Supercomputing Center
  • Computer Sciences
  • Articles de revista
  • View Item
  •   DSpace Home
  • E-prints
  • Centres de recerca
  • BSC - Barcelona Supercomputing Center
  • Computer Sciences
  • Articles de revista
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Efficient development of high performance data analytics in Python

Thumbnail
View/Open
Python.pdf (1,134Mb)
Share:
 
 
10.1016/j.future.2019.09.051
 
  View Usage Statistics
Cita com:
hdl:2117/184867

Show full item record
Álvarez Cid-Fuentes, Javier
Alvarez, Pol
Amela Milian, RamonMés informació
Ishii, Kuninori
Morizawa, Rafael K.
Badia Sala, Rosa MariaMés informacióMés informacióMés informació
Document typeArticle
Defense date2020-10
PublisherElsevier
Rights accessOpen Access
Attribution-NonCommercial-NoDerivs 4.0 International
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution-NonCommercial-NoDerivs 4.0 International
ProjectCOMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
STARS - SupercompuTing And Related applicationS Fellows Program (EC-H2020-754433)
COMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
BARCELONA SUPERCOMPUTING CENTER - CENTRO. NACIONAL DE SUPERCOMPUTACION (MINECO-SEV-2015-0493)
Abstract
Our society is generating an increasing amount of data at an unprecedented scale, variety, and speed. This also applies to numerous research areas, such as genomics, high energy physics, and astronomy, for which large-scale data processing has become crucial. However, there is still a gap between the traditional scientific computing ecosystem and big data analytics tools and frameworks. On the one hand, high performance computing (HPC) programming models lack productivity, and do not provide means for processing large amounts of data in a simple manner. On the other hand, existing big data processing tools have performance issues in HPC environments, and are not general-purpose. In this paper, we propose and evaluate PyCOMPSs, a task-based programming model for Python, as an excellent solution for distributed big data processing in HPC infrastructures. Among other useful features, PyCOMPSs offers a highly productive general-purpose programming model, is infrastructure-agnostic, and provides transparent data management with support for distributed storage systems. We show how two machine learning algorithms (Cascade SVM and K-means) can be developed with PyCOMPSs, and evaluate PyCOMPSs’ productivity based on these algorithms. Additionally, we evaluate PyCOMPSs performance on an HPC cluster using up to 1,536 cores and 320 million input vectors. Our results show that PyCOMPSs achieves similar performance and scalability to MPI in HPC infrastructures, while providing a much more productive interface that allows the easy development of data analytics algorithms.
CitationÁlvarez, J. [et al.]. Efficient development of high performance data analytics in Python. "Future generation computer systems", Octubre 2020, vol. 111, p. 570-581. 
URIhttp://hdl.handle.net/2117/184867
DOI10.1016/j.future.2019.09.051
ISSN0167-739X
Publisher versionhttps://www.sciencedirect.com/science/article/pii/S0167739X18321393?via%3Dihub
Collections
  • Computer Sciences - Articles de revista [259]
  • Departament d'Arquitectura de Computadors - Articles de revista [910]
  • Doctorat en Bioinformàtica - Articles de revista [8]
  • CAP - Grup de Computació d'Altes Prestacions - Articles de revista [370]
Share:
 
  View Usage Statistics

Show full item record

FilesDescriptionSizeFormatView
Python.pdf1,134MbPDFView/Open

Browse

This CollectionBy Issue DateAuthorsOther contributionsTitlesSubjectsThis repositoryCommunities & CollectionsBy Issue DateAuthorsOther contributionsTitlesSubjects

© UPC Obrir en finestra nova . Servei de Biblioteques, Publicacions i Arxius

info.biblioteques@upc.edu

  • About This Repository
  • Contact Us
  • Send Feedback
  • Inici de la pàgina