The state of SQL-on-Hadoop in the cloud

Poggi, Nicolas; Berral García, Josep Lluís; Fenech, Thomas; Carrera Pérez, David; Blakeley, Jose; Minhas, Umar F.; Vujic, Nikola

doi:10.1109/BigData.2016.7840751

dc.contributor.author	Poggi, Nicolas
dc.contributor.author	Berral García, Josep Lluís
dc.contributor.author	Fenech, Thomas
dc.contributor.author	Carrera Pérez, David
dc.contributor.author	Blakeley, Jose
dc.contributor.author	Minhas, Umar F.
dc.contributor.author	Vujic, Nikola
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.other	Barcelona Supercomputing Center
dc.date.accessioned	2017-05-15T08:48:05Z
dc.date.available	2017-05-15T08:48:05Z
dc.date.issued	2016
dc.identifier.citation	Poggi, N., Berral, J., Fenech, T., Carrera, D., Blakeley, J., Minhas, U., Vujic, N. The state of SQL-on-Hadoop in the cloud. A: IEEE International Conference on Big Data. "2016 IEEE International Conference on Big Data: Dec 05-Dec 08, 2015, Washington D.C., USA: proceedings". Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 1432-1443.
dc.identifier.isbn	978-1-4673-9004-0
dc.identifier.uri	http://hdl.handle.net/2117/104402
dc.description.abstract	Managed Hadoop in the cloud, especially SQL-on-Hadoop, has been gaining attention recently. On Platform-as-a-Service (PaaS), analytical services like Hive and Spark come preconfigured for general-purpose and ready to use. Thus, giving companies a quick entry and on-demand deployment of ready SQL-like solutions for their big data needs. This study evaluates cloud services from an end-user perspective, comparing providers including: Microsoft Azure, Amazon Web Services, Google Cloud, and Rackspace. The study focuses on performance, readiness, scalability, and cost-effectiveness of the different solutions at entry/test level clusters sizes. Results are based on over 15,000 Hive queries derived from the industry standard TPC-H benchmark. The study is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines. The ALOJA Project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization. The study benchmarks cloud providers across a diverse range instance types, and uses input data scales from 1GB to 1TB, in order to survey the popular entry-level PaaS SQL-on-Hadoop solutions, thereby establishing a common results-base upon which subsequent research can be carried out by the project. Initial results already show the main performance trends to both hardware and software configuration, pricing, similarities and architectural differences of the evaluated PaaS solutions. Whereas some providers focus on decoupling storage and computing resources while offering network-based elastic storage, others choose to keep the local processing model from Hadoop for high performance, but reducing flexibility. Results also show the importance of application-level tuning and how keeping up-to-date hardware and software stacks can influence performance even more than replicating the on-premises model in the cloud.
dc.description.sponsorship	This work is partially supported by the Microsoft Azure for Research program, the European Research Council (ERC) under the EUs Horizon 2020 programme (GA 639595), the Spanish Ministry of Education (TIN2015-65316-P), and the Generalitat de Catalunya (2014-SGR-1051).
dc.format.extent	12 p.
dc.language.iso	eng
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.subject	Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcsh	Big data
dc.subject.lcsh	Cloud computing
dc.subject.other	Managed Haddop
dc.subject.other	SQL-on-Hadoop
dc.subject.other	Platform-as-a-Service (PaaS)
dc.subject.other	ALOJA
dc.title	The state of SQL-on-Hadoop in the cloud
dc.type	Conference report
dc.subject.lemac	Macrodades
dc.subject.lemac	Computació en núvol
dc.contributor.group	Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi	10.1109/BigData.2016.7840751
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	http://ieeexplore.ieee.org/document/7840751/
dc.rights.access	Open Access
local.identifier.drac	19550342
dc.description.version	Postprint (author's final draft)
dc.relation.projectid	info:eu-repo/grantAgreement/EC/H2020/639595/EU/Holistic Integration of Emerging Supercomputing Technologies/Hi-EST
dc.relation.projectid	info:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
local.citation.author	Poggi, N.; Berral, J.; Fenech, T.; Carrera, D.; Blakeley, J.; Minhas, U.; Vujic, N.
local.citation.contributor	IEEE International Conference on Big Data
local.citation.publicationName	2016 IEEE International Conference on Big Data: Dec 05-Dec 08, 2015, Washington D.C., USA: proceedings
local.citation.startingPage	1432
local.citation.endingPage	1443

Fitxers d'aquest items

Nom:: the+state+of+SQL-on-Hadoop+in+ ...
Mida:: 1,326Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [574]
Ponències/Comunicacions de congressos [784]
Ponències/Comunicacions de congressos [1.954]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

The state of SQL-on-Hadoop in the cloud

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora