Show simple item record

dc.contributor.authorNikolopoulos, Dimitrios
dc.contributor.authorAyguadé Parra, Eduard
dc.contributor.authorPolychronopoulos, C D
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned2018-05-16T14:36:17Z
dc.date.issued2002-08
dc.identifier.citationNikolopoulos, D., Ayguade, E., Polychronopoulos, C. Runtime vs. manual data distribution for architecture-agnostic shared-memory programming models. "Journal of parallel and distributed computing", Agost 2002, vol. 30, núm. 4, p. 225-254.
dc.identifier.issn0743-7315
dc.identifier.urihttp://hdl.handle.net/2117/117287
dc.description.abstractThis paper compares data distribution methodologies for scaling the performance of OpenMP on NUMA architectures. We investigate the performance of automatic page placement algorithms implemented in the operating system, runtime algorithms based on dynamic page migration, runtime algorithms based on loop scheduling transformations and manual data distribution. These techniques present the programmer with trade-offs between performance and programming effort. Automatic page placement algorithms are transparent to the programmer, but may compromise memory access locality. Dynamic page migration algorithms are also transparent, but require careful engineering and tuned implementations to be effective. Manual data distribution requires substantial programming effort and architecture-specific extensions to the API, but may localize memory accesses in a nearly optimal manner. Loop scheduling transformations may or may not require intervention from the programmer, but conform better to an architecture-agnostic programming paradigm like OpenMP. We identify the conditions under which runtime data distribution algorithms can optimize memory access locality in OpenMP. We also present two novel runtime data distribution techniques, one based on memory access traces and another based on affinity scheduling of parallel loops. These techniques can be used to effectively replace manual data distribution in regular applications. The results provide a proof of concept that it is possible to scale a portable shared-memory programming model up to more than 100 processors, without modifying the API and without exposing architectural details to the programmer.
dc.format.extent30 p.
dc.language.isoeng
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles
dc.subject.lcshParallel processing (Electronic computers)
dc.subject.otherData distribution
dc.subject.otherOperating systems
dc.subject.otherRuntime systems
dc.subject.otherPerformance evaluation
dc.subject.otherOpenMP
dc.titleRuntime vs. manual data distribution for architecture-agnostic shared-memory programming models
dc.typeArticle
dc.subject.lemacProcessament en paral·lel (Ordinadors)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1023/A:1019899812171
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://link.springer.com/article/10.1023%2FA%3A1019899812171
dc.rights.accessRestricted access - publisher's policy
local.identifier.drac1642820
dc.description.versionPostprint (published version)
dc.date.lift10000-01-01
local.citation.authorNikolopoulos, D.; Ayguade, E.; Polychronopoulos, C.
local.citation.publicationNameJournal of parallel and distributed computing
local.citation.volume30
local.citation.number4
local.citation.startingPage225
local.citation.endingPage254


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder