Mostra el registre d'ítem simple
You only run once: Spark auto-tuning from a single run
dc.contributor.author | Buchaca Prats, David |
dc.contributor.author | Albuquerque Portella, Felipe |
dc.contributor.author | Costa, Carlos H. A. |
dc.contributor.author | Berral García, Josep Lluís |
dc.contributor.other | Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors |
dc.contributor.other | Barcelona Supercomputing Center |
dc.date.accessioned | 2021-02-22T10:25:29Z |
dc.date.available | 2021-02-22T10:25:29Z |
dc.date.issued | 2020-12 |
dc.identifier.citation | Buchaca, D. [et al.]. You only run once: Spark auto-tuning from a single run. "IEEE transactions on network and service management", Desembre 2020, vol. 17, núm. 4, p. 2039-2051. |
dc.identifier.issn | 1932-4537 |
dc.identifier.uri | http://hdl.handle.net/2117/340271 |
dc.description.abstract | Tuning configurations of Spark jobs is not a trivial task. State-of-the-art auto-tuning systems are based on iteratively running workloads with different configurations. During the optimization process, the relevant features are explored to find good solutions. Many optimizers enhance the time-to-solution using black-box optimization algorithms that do not take into account any information from the Spark workloads. In this article, we present a new method for tuning configurations that uses information from one run of a Spark workload. To achieve good performance, we mine the SparkEventLog that is generated by the Spark engine. This log file contains a large amount of information from the executed application. We use this information to enhance a performance model with low-level features from the workload to be optimized. These features include Spark Actions, Transformations, and Task metrics. This process allows us to obtain application-specific workload information. With this information our system can predict sensible Spark configurations for unseen jobs, given that it has been trained with reasonable coverage of Spark applications. Experiments show that the presented system correctly produces good configurations, while achieving up to 80% speedup with respect to the default Spark configuration, and up to 12x speedup of the time-to-solution with respect to a standard Bayesian Optimization procedure. |
dc.description.sponsorship | This work is supported by the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement no 639595); the Spanish Ministry of Economy, under contract TIN2015- 65316-P, and the Generalitat de Catalunya under contract 2014SGR1051; the ICREA Academia program; the BSC-CNS Severo Ochoa program (SEV-2015-0493); and by Petroleo Brasileiro S. A. (PETROBRAS). |
dc.format.extent | 13 p. |
dc.language.iso | eng |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors |
dc.subject.lcsh | Bayesian statistical decision theory |
dc.subject.lcsh | Machine learning |
dc.subject.lcsh | Parallel processing (Electronic computers) |
dc.subject.other | Decision making for workload auto-tuning |
dc.subject.other | Spark auto-tuning |
dc.subject.other | Workload modeling |
dc.subject.other | Workload placement |
dc.title | You only run once: Spark auto-tuning from a single run |
dc.type | Article |
dc.subject.lemac | Estadística bayesiana |
dc.subject.lemac | Aprenentatge automàtic |
dc.subject.lemac | Processament en paral·lel (Ordinadors) |
dc.contributor.group | Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions |
dc.identifier.doi | 10.1109/TNSM.2020.3034824 |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | https://ieeexplore.ieee.org/document/9244226 |
dc.rights.access | Open Access |
local.identifier.drac | 30567433 |
dc.description.version | Postprint (author's final draft) |
dc.relation.projectid | info:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/ |
dc.relation.projectid | info:eu-repo/grantAgreement/AGAUR/V PRI/2014 SGR 1051 |
dc.relation.projectid | info:eu-repo/grantAgreement/EC/H2020/639595/EU/Holistic Integration of Emerging Supercomputing Technologies/Hi-EST |
local.citation.author | Buchaca, D.; Albuquerque, F.; Costa, C.; Berral, J. |
local.citation.publicationName | IEEE transactions on network and service management |
local.citation.volume | 17 |
local.citation.number | 4 |
local.citation.startingPage | 2039 |
local.citation.endingPage | 2051 |
Fitxers d'aquest items
Aquest ítem apareix a les col·leccions següents
-
Articles de revista [318]
-
Articles de revista [382]
-
Articles de revista [164]