Show simple item record

dc.contributor.authorGlushkova, Daria
dc.contributor.authorJovanovic, Petar
dc.contributor.authorAbelló Gamazo, Alberto
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Enginyeria de Serveis i Sistemes d'Informació
dc.date.accessioned2018-11-15T11:40:49Z
dc.date.issued2019-01
dc.identifier.citationGlushkova, D., Jovanovic, P., Abelló, A. Mapreduce performance model for Hadoop 2.x. "Information systems", Gener 2019, vol. 79, p. 32-43.
dc.identifier.issn0306-4379
dc.identifier.urihttp://hdl.handle.net/2117/124328
dc.description.abstractMapReduce is a popular programming model for distributed processing of large data sets. Apache Hadoop is one of the most common open-source implementations of such paradigm. Performance analysis of concurrent job executions has been recognized as a challenging problem, at the same time, that may provide reasonably accurate job response time estimation at significantly lower cost than experimental evaluation of real setups. In this paper, we tackle the challenge of defining MapReduce performance model for Hadoop 2.x. While there are several efficient approaches for modeling the performance of MapReduce workloads in Hadoop 1.x, they could not be applied to Hadoop 2.x due to fundamental architectural changes and dynamic resource allocation in Hadoop 2.x. Thus, the proposed solution is based on an existing performance model for Hadoop 1.x, but taking into consideration architectural changes and capturing the execution flow of a MapReduce job by using queuing network model. This way, the cost model reflects the intra-job synchronization constraints that occur due the contention at shared resources. The accuracy of our solution is validated via comparison of our model estimates against measurements in a real Hadoop 2.x setup.
dc.format.extent12 p.
dc.language.isoeng
dc.publisherElsevier
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Spain
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures distribuïdes
dc.subject.lcshElectronic data processing -- Distributed processing
dc.subject.lcshCost effectiveness
dc.subject.otherHadoop 2.x
dc.subject.otherMapReduce performance model
dc.titleMapreduce performance model for Hadoop 2.x
dc.typeArticle
dc.subject.lemacProcessament distribuït de dades
dc.subject.lemacCost-eficàcia
dc.contributor.groupUniversitat Politècnica de Catalunya. GESSI - Grup d'Enginyeria del Software i dels Serveis
dc.identifier.doi10.1016/j.is.2017.11.006
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://www.sciencedirect.com/science/article/pii/S0306437917304659
dc.rights.accessRestricted access - publisher's policy
drac.iddocument22523986
dc.description.versionPostprint (author's final draft)
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/1PE/TIN2016-79269-R
dc.date.lift2019-12-02
upcommons.citation.authorGlushkova, D., Jovanovic, P., Abelló, A.
upcommons.citation.publishedtrue
upcommons.citation.publicationNameInformation systems
upcommons.citation.volume79
upcommons.citation.startingPage32
upcommons.citation.endingPage43


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Except where otherwise noted, content on this work is licensed under a Creative Commons license: Attribution-NonCommercial-NoDerivs 3.0 Spain