Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors
Visualitza/Obre
Cita com:
hdl:2117/346201
Tipus de documentText en actes de congrés
Data publicació1999
EditorAssociation for Computing Machinery (ACM)
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
This paper presents some techniques for efficient thread forking and joining in parallel execution environments, taking into consideration the physical structure of NUMA machines and the support for multi-level parallelization and processor grouping. Two work generation schemes and one join mechanism are designed, implemented, evaluated and compared with the ones used in the IFUX MP library, an efficient implementation which supports a single level of parallelism. Supporting multiple levels of parallelism is a current research goal, both in shared and distributed memory machines. Our proposals include a first work generation scheme (GWD, or global work descriptor) which supports multiple levels of parallelism, but not processor grouping. The second work generation scheme (LWD, or local work descriptor) has been designed to support multiple levels of parallelism and processor grouping. Processor grouping is needed to distribute processors among different parts of the computation and maintain the working set of each processor across different parallel constructs. The mechanisms are evaluated using synthetic benchmarks, two SPEC95fp applications and one NAS application. The performance evaluation concludes that: i) the overhead of the proposed mechanisms is similar to the overhead of the existing ones when exploiting a single level of parallelism, and ii) a remarkable improvement in performance is obtained for applications that have multiple levels of parallelism. The comparison with the traditional single-level parallelism exploitation gives an improvement in the range of 30-65% for these applications.
CitacióMartorell, X. [et al.]. Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors. A: International Conference on Supercomputing. "ICS '99: proceedings of the 13th International Conference on Supercomputing". New York: Association for Computing Machinery (ACM), 1999, p. 294-301. ISBN 1-58113-164-X. DOI 10.1145/305138.305206.
ISBN1-58113-164-X
Versió de l'editorhttp://dl.acm.org/citation.cfm?id=305206
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
Martorell et al.pdf | 285,5Kb | Visualitza/Obre |