You only run once: Spark auto-tuning from a single run
| dc.contributor.author | Buchaca Prats, David |
| dc.contributor.author | Albuquerque Portella, Felipe |
| dc.contributor.author | Costa, Carlos H. A. |
| dc.contributor.author | Berral García, Josep Lluís |
| dc.contributor.group | Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions |
| dc.contributor.other | Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors |
| dc.contributor.other | Barcelona Supercomputing Center |
| dc.date.accessioned | 2021-02-22T10:25:29Z |
| dc.date.available | 2021-02-22T10:25:29Z |
| dc.date.issued | 2020-12 |
| dc.description.abstract | Tuning configurations of Spark jobs is not a trivial task. State-of-the-art auto-tuning systems are based on iteratively running workloads with different configurations. During the optimization process, the relevant features are explored to find good solutions. Many optimizers enhance the time-to-solution using black-box optimization algorithms that do not take into account any information from the Spark workloads. In this article, we present a new method for tuning configurations that uses information from one run of a Spark workload. To achieve good performance, we mine the SparkEventLog that is generated by the Spark engine. This log file contains a large amount of information from the executed application. We use this information to enhance a performance model with low-level features from the workload to be optimized. These features include Spark Actions, Transformations, and Task metrics. This process allows us to obtain application-specific workload information. With this information our system can predict sensible Spark configurations for unseen jobs, given that it has been trained with reasonable coverage of Spark applications. Experiments show that the presented system correctly produces good configurations, while achieving up to 80% speedup with respect to the default Spark configuration, and up to 12x speedup of the time-to-solution with respect to a standard Bayesian Optimization procedure. |
| dc.description.peerreviewed | Peer Reviewed |
| dc.description.sponsorship | This work is supported by the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement no 639595); the Spanish Ministry of Economy, under contract TIN2015- 65316-P, and the Generalitat de Catalunya under contract 2014SGR1051; the ICREA Academia program; the BSC-CNS Severo Ochoa program (SEV-2015-0493); and by Petroleo Brasileiro S. A. (PETROBRAS). |
| dc.description.version | Postprint (author's final draft) |
| dc.format.extent | 13 p. |
| dc.identifier.citation | Buchaca, D. [et al.]. You only run once: Spark auto-tuning from a single run. "IEEE transactions on network and service management", Desembre 2020, vol. 17, núm. 4, p. 2039-2051. |
| dc.identifier.doi | 10.1109/TNSM.2020.3034824 |
| dc.identifier.issn | 1932-4537 |
| dc.identifier.uri | https://hdl.handle.net/2117/340271 |
| dc.language.iso | eng |
| dc.relation.projectid | info:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/ |
| dc.relation.projectid | info:eu-repo/grantAgreement/AGAUR/V PRI/2014 SGR 1051 |
| dc.relation.projectid | info:eu-repo/grantAgreement/EC/H2020/639595/EU/Holistic Integration of Emerging Supercomputing Technologies/Hi-EST |
| dc.relation.publisherversion | https://ieeexplore.ieee.org/document/9244226 |
| dc.rights.access | Open Access |
| dc.subject | Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors |
| dc.subject.lcsh | Bayesian statistical decision theory |
| dc.subject.lcsh | Machine learning |
| dc.subject.lcsh | Parallel processing (Electronic computers) |
| dc.subject.lemac | Estadística bayesiana |
| dc.subject.lemac | Aprenentatge automàtic |
| dc.subject.lemac | Processament en paral·lel (Ordinadors) |
| dc.subject.other | Decision making for workload auto-tuning |
| dc.subject.other | Spark auto-tuning |
| dc.subject.other | Workload modeling |
| dc.subject.other | Workload placement |
| dc.title | You only run once: Spark auto-tuning from a single run |
| dc.type | Article |
| dspace.entity.type | Publication |
| local.citation.author | Buchaca, D.; Albuquerque, F.; Costa, C.; Berral, J. |
| local.citation.endingPage | 2051 |
| local.citation.number | 4 |
| local.citation.publicationName | IEEE transactions on network and service management |
| local.citation.startingPage | 2039 |
| local.citation.volume | 17 |
| local.identifier.drac | 30567433 |
Fitxers
Paquet original
1 - 1 de 1
Carregant...
- Nom:
- YORO_IEEE_TNSM_final_version_UPCversion-3.pdf
- Mida:
- 1.22 MB
- Format:
- Adobe Portable Document Format
- Descripció:



