Show simple item record

dc.contributor.authorServat, Harald
dc.contributor.authorLabarta Mancho, Jesús José
dc.contributor.authorHoppe, Hans-Christian
dc.contributor.authorGiménez, Judit
dc.contributor.authorPeña, Antonio J.
dc.contributor.otherBarcelona Supercomputing Center
dc.date.accessioned2018-07-24T10:16:13Z
dc.date.available2020-07-09T00:26:18Z
dc.date.issued2018-10
dc.identifier.citationServat, H. [et al.]. Understanding memory access patterns using the BSC performance tools. "Parallel Computing", Octubre 2018, vol. 78, p. 1-14.
dc.identifier.issn0167-8191
dc.identifier.otherhttps://arxiv.org/abs/2005.05872
dc.identifier.urihttp://hdl.handle.net/2117/119839
dc.description.abstractThe growing gap between processor and memory speeds has lead to complex memory hierarchies as processors evolve to mitigate such divergence by exploiting the locality of reference. In this direction, the BSC performance analysis tools have been recently extended to provide insight into the application memory accesses by depicting their temporal and spatial characteristics, correlating with the source-code and the achieved performance simultaneously. These extensions rely on the Precise Event-Based Sampling (PEBS) mechanism available in recent Intel processors to capture information regarding the application memory accesses. The sampled information is later combined with the Folding technique to represent a detailed temporal evolution of the memory accesses and in conjunction with the achieved performance and the source-code counterpart. The reports generated by the latter tool help not only application developers but also processor architects to understand better how the application behaves and how the system performs. In this paper, we describe a tighter integration of the sampling mechanism into the monitoring package. We also demonstrate the value of the complete workflow by exploring already optimized state–of–the–art benchmarks, providing detailed insight of their memory access behavior. We have taken advantage of this insight to apply small modifications that improve the applications’ performance.
dc.description.sponsorshipThis work has been performed in the Intel-BSC Exascale Lab. We would like to thank Forschungszentrum Jülich for the compute time on the Jureca system. This project has received funding from the European Union’s Horizon 2020 research and innovation program under Marie Sklodowska-Curie grant agreement no. 749516.
dc.format.extent14 p.
dc.language.isoeng
dc.publisherElsevier
dc.subjectÀrees temàtiques de la UPC::Informàtica
dc.subject.lcshHigh performance computing
dc.subject.otherPerformance analysis
dc.subject.otherMemory references
dc.subject.otherSampling
dc.subject.otherInstrumentation
dc.titleUnderstanding memory access patterns using the BSC performance tools
dc.typeArticle
dc.subject.lemacSupercomputadors
dc.identifier.doi10.1016/j.parco.2018.06.007
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://www.sciencedirect.com/science/article/pii/S0167819118301911
dc.rights.accessOpen Access
local.identifier.drac23358637
dc.description.versionPostprint (author's final draft)
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/H2020/749516/EU/Advanced Ecosystem for Broad Heterogeneous Memory Usage/ECO-H-MEM
local.citation.publicationNameParallel Computing
local.citation.volume78
local.citation.startingPage1
local.citation.endingPage14


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record