Show simple item record

dc.contributor.authorServat, Harald
dc.contributor.authorLlort, German
dc.contributor.authorHuck, Kevin A.
dc.contributor.authorGiménez Lucas, Judit
dc.contributor.authorLabarta Mancho, Jesús José
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned2013-07-26T15:41:56Z
dc.date.created2013-08
dc.date.issued2013-08
dc.identifier.citationServat, H. [et al.]. Framework for a productive performance optimization. "Parallel computing", Agost 2013, vol. 39, núm. 8, p. 336-353.
dc.identifier.issn0167-8191
dc.identifier.urihttp://hdl.handle.net/2117/20012
dc.description.abstractModern supercomputers deliver large computational power, but it is difficult for an application to exploit such power. One factor that limits the application performance is the single node performance. While many performance tools use the microprocessor performance counters to provide insights on serial node performance issues, the complex semantics of these counters pose an obstacle to an inexperienced developer. We present a framework that allows easy identification and qualification of serial node performance bottlenecks in parallel applications. The output of the framework is precise and it is capable of correlating performance inefficiencies with small regions of code within the application. The framework not only points to regions of code but also simplifies the semantics of the performance counters into metrics that refer to processor functional units. With such information the developer can focus on the identified code and improve it by knowing which processor execution unit is degrading the performance. To demonstrate the usefulness of the framework we apply it to three already optimized applications using realistic inputs and, according to the results, modify their source code. By doing modifications that require little effort, we successfully increase the applications’ performance from 10% to 30% and thus shorten the time required to reach the solution and/or allow facing increased problem sizes.
dc.format.extent18 p.
dc.language.isoeng
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Spain
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subjectÀrees temàtiques de la UPC::Informàtica::Programació
dc.subject.lcshSupercomputers
dc.subject.lcshParallel programming (Computer science)
dc.subject.otherApplication tuning
dc.subject.otherInstrumentation
dc.subject.otherPerformance analysis
dc.subject.otherPerformance models
dc.subject.otherPerformance tools
dc.subject.otherSampling
dc.titleFramework for a productive performance optimization
dc.typeArticle
dc.subject.lemacSupercomputadors
dc.subject.lemacProgramació en paral·lel (Informàtica)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1016/j.parco.2013.05.004
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://www.sciencedirect.com/science/article/pii/S0167819113000707
dc.rights.accessRestricted access - publisher's policy
local.identifier.drac12674931
dc.description.versionPreprint
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/FP7/283493/EU/PRACE - Second Implementation Phase Project/PRACE-2IP
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/FP7/287759/EU/High Performance and Embedded Architecture and Compilation/HIPEAC
dc.date.lift10000-01-01
local.citation.authorServat, H.; Llort, G.; Huck, K.; Gimenez, J.; Labarta, J.
local.citation.publicationNameParallel computing
local.citation.volume39
local.citation.number8
local.citation.startingPage336
local.citation.endingPage353


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Spain
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution-NonCommercial-NoDerivs 3.0 Spain