Mostra el registre d'ítem simple

dc.contributor.authorPoggi, Nicolas
dc.contributor.authorBerral García, Josep Lluís
dc.contributor.authorFenech, Thomas
dc.contributor.authorCarrera Pérez, David
dc.contributor.authorBlakeley, Jose
dc.contributor.authorMinhas, Umar F.
dc.contributor.authorVujic, Nikola
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.otherBarcelona Supercomputing Center
dc.date.accessioned2017-05-15T08:48:05Z
dc.date.available2017-05-15T08:48:05Z
dc.date.issued2016
dc.identifier.citationPoggi, N., Berral, J., Fenech, T., Carrera, D., Blakeley, J., Minhas, U., Vujic, N. The state of SQL-on-Hadoop in the cloud. A: IEEE International Conference on Big Data. "2016 IEEE International Conference on Big Data: Dec 05-Dec 08, 2015, Washington D.C., USA: proceedings". Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 1432-1443.
dc.identifier.isbn978-1-4673-9004-0
dc.identifier.urihttp://hdl.handle.net/2117/104402
dc.description.abstractManaged Hadoop in the cloud, especially SQL-on-Hadoop, has been gaining attention recently. On Platform-as-a-Service (PaaS), analytical services like Hive and Spark come preconfigured for general-purpose and ready to use. Thus, giving companies a quick entry and on-demand deployment of ready SQL-like solutions for their big data needs. This study evaluates cloud services from an end-user perspective, comparing providers including: Microsoft Azure, Amazon Web Services, Google Cloud, and Rackspace. The study focuses on performance, readiness, scalability, and cost-effectiveness of the different solutions at entry/test level clusters sizes. Results are based on over 15,000 Hive queries derived from the industry standard TPC-H benchmark. The study is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines. The ALOJA Project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization. The study benchmarks cloud providers across a diverse range instance types, and uses input data scales from 1GB to 1TB, in order to survey the popular entry-level PaaS SQL-on-Hadoop solutions, thereby establishing a common results-base upon which subsequent research can be carried out by the project. Initial results already show the main performance trends to both hardware and software configuration, pricing, similarities and architectural differences of the evaluated PaaS solutions. Whereas some providers focus on decoupling storage and computing resources while offering network-based elastic storage, others choose to keep the local processing model from Hadoop for high performance, but reducing flexibility. Results also show the importance of application-level tuning and how keeping up-to-date hardware and software stacks can influence performance even more than replicating the on-premises model in the cloud.
dc.description.sponsorshipThis work is partially supported by the Microsoft Azure for Research program, the European Research Council (ERC) under the EUs Horizon 2020 programme (GA 639595), the Spanish Ministry of Education (TIN2015-65316-P), and the Generalitat de Catalunya (2014-SGR-1051).
dc.format.extent12 p.
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshBig data
dc.subject.lcshCloud computing
dc.subject.otherManaged Haddop
dc.subject.otherSQL-on-Hadoop
dc.subject.otherPlatform-as-a-Service (PaaS)
dc.subject.otherALOJA
dc.titleThe state of SQL-on-Hadoop in the cloud
dc.typeConference report
dc.subject.lemacMacrodades
dc.subject.lemacComputació en núvol
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1109/BigData.2016.7840751
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://ieeexplore.ieee.org/document/7840751/
dc.rights.accessOpen Access
local.identifier.drac19550342
dc.description.versionPostprint (author's final draft)
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/H2020/639595/EU/Holistic Integration of Emerging Supercomputing Technologies/Hi-EST
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
local.citation.authorPoggi, N.; Berral, J.; Fenech, T.; Carrera, D.; Blakeley, J.; Minhas, U.; Vujic, N.
local.citation.contributorIEEE International Conference on Big Data
local.citation.publicationName2016 IEEE International Conference on Big Data: Dec 05-Dec 08, 2015, Washington D.C., USA: proceedings
local.citation.startingPage1432
local.citation.endingPage1443


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple