Rights accessRestricted access - author's decision
Current high performance computing architectures are composed of large shared memory NUMA nodes, among other components. Such nodes are becoming increasingly complex as they have several NUMA domains with different access latencies depending on the core where the access is issued. In this work, we propose techniques to efficiently mitigate the negative impact of NUMA effects on parallel applications performance. We leverage runtime system metadata expressed in terms of a task dependency graph, where nodes are sequential pieces of code and edges are control or data dependencies between them, to efficiently reduce data transfers using graph partitioning techniques. With our proposals, we are able to improve the execution time of OpenMP parallel codes a factor of $2.02\times$ on average when run on architectures with strong NUMA effects.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder. If you wish to make any use of the work not provided for in the law, please contact: email@example.com