Predicate-based filtering for multi-GPU utilization in directive-based programming
Tipo de documentoTexto en actas de congreso
Fecha de publicación2021-05
EditorBarcelona Supercomputing Center
Condiciones de accesoAcceso abierto
Todos los derechos reservados. Esta obra está protegida por los derechos de propiedad intelectual e industrial. Sin perjuicio de las exenciones legales existentes, queda prohibida su reproducción, distribución, comunicación pública o transformación sin la autorización del titular de los derechos
Designing and building supercomputers is a complex task in the field of high-performance computing (HPC). The hardware, middleware and algorithms need to effectively collaborate to achieve ideal results for massive and practical problems. To facilitate the easy usage of supercomputers, compiler technologies have been developed with highly automated program optimizations that use domain-specific knowledge and understandings of target architectures . Directive-based programming has been employed for enabling accelerator use, while replacing vendor-specific coding with directive insertion. Keeping software portability with minimum engineering efforts upon sequential code, OpenACC and OpenMP are now widely used for accelerator programming , . However, pursuing ideal performance is often challenging. The bare insertion of directives by the programmers exposes less program characteristics for the compilation; thus, programmers aiming at better efficiency are forced to reshape their code merely for adjusting to the environment such as compilers, software stacks and heterogeneous architecture. While keeping the productivity, our research extends OpenACC to exploit further optimization opportunities. In a portable fashion that relies on other compilers, our approach provides an environment which enables dynamic analysis of computation and perform on-the-fly kernel specialization. Considering the high memory latency of GPUs, we add a novel code-translation technique named predicated-based filtering to automate multi-device utilization. We never split loop ranges nor introduce fine dependency analysis, but divide data ranges to be updated on each device. This idea allows to distribute highly-tuned code without changing code structure nor parallelism.
CitaciónMatsumura, K.; García De Gonzalo, S.; Peña, A. Predicate-based filtering for multi-GPU utilization in directive-based programming. A: . Barcelona Supercomputing Center, 2021, p. 48-49.