Using shared-data localization to reduce the cost of inspector-execution in unified-parallel-C programs
Visualitza/Obre
10.1016/j.parco.2016.03.002
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/99192
Tipus de documentArticle
Data publicació2016-05-01
Condicions d'accésAccés obert
Llevat que s'hi indiqui el contrari, els
continguts d'aquesta obra estan subjectes a la llicència de Creative Commons
:
Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya
Abstract
Programs written in the Unified Parallel C (UPC) language can access any location of the entire local and remote address space via read/write operations. However, UPC programs that contain fine-grained shared accesses can exhibit performance degradation. One solution is to use the inspector-executor technique to coalesce fine-grained shared accesses to larger remote access operations. A straightforward implementation of the inspector executor transformation results in excessive instrumentation that hinders performance.; This paper addresses this issue and introduces various techniques that aim at reducing the generated instrumentation code: a shared-data localization transformation based on Constant-Stride Linear Memory Descriptors (CSLMADs) [S. Aarseth, Gravitational N-Body Simulations: Tools and Algorithms, Cambridge Monographs on Mathematical Physics, Cambridge University Press, 2003.], the inlining of data locality checks and the usage of an index vector to aggregate the data. Finally, the paper introduces a lightweight loop code motion transformation to privatize shared scalars that were propagated through the loop body.; A performance evaluation, using up to 2048 cores of a POWER 775, explores the impact of each optimization and characterizes the overheads of UPC programs. It also shows that the presented optimizations increase performance of UPC programs up to 1.8 x their UPC hand-optimized counterpart for applications with regular accesses and up to 6.3 x for applications with irregular accesses.
CitacióAlvanos, M., Tiotto, E., Amaral, J.N., Farreras, M., Martorell, X. Using shared-data localization to reduce the cost of inspector-execution in unified-parallel-C programs. "Parallel computing", 1 Maig 2016, vol. 54, p. 2-14.
ISSN0167-8191
Versió de l'editorhttp://www.sciencedirect.com/science/article/pii/S0167819116300096
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
parco-2016.pdf | 566,9Kb | Visualitza/Obre |