Rights accessRestricted access - publisher's policy
Clusters of GPUs are emerging as a new computational
scenario. Programming them requires the use of
hybrid models that increase the complexity of the applications,
reducing the productivity of programmers.
We present the implementation of OmpSs for clusters of
GPUs, which supports asynchrony and heterogeneity for task
parallelism. It is based on annotating a serial application with
directives that are translated by the compiler. With it, the same
program that runs sequentially in a node with a single GPU
can run in parallel in multiple GPUs either local (single node)
or remote (cluster of GPUs). Besides performing a task-based
parallelization, the runtime system moves the data as needed
between the different nodes and GPUs minimizing the impact
of communication by using affinity scheduling, caching, and
by overlapping communication with the computational task.
We show several applicactions programmed with OmpSs
and their performance with multiple GPUs in a local node
and in remote nodes. The results show good tradeoff between
performance and effort from the programmer.
CitationBueno, J. [et al.]. Productive programming of GPU clusters with OmpSs. A: IEEE International Parallel and Distributed Processing Symposium. "2012 IEEE 26th International Parallel & Distributed Processing Symposium (IPDPS) 21-25 May 2012: Shanghai, China". Shanghai: 2012, p. 557-568.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder. If you wish to make any use of the work not provided for in the law, please contact: firstname.lastname@example.org