An OpenMP* barrier using SIMD instructions for Intel® Xeon Phi™ coprocessor
Document typeConference report
Rights accessRestricted access - publisher's policy
European Commisision's projectHIPEAC - High Performance and Embedded Architecture and Compilation (EC-FP7-287759)
Barrier synchronisation is a widely-studied topic since the supercomputer era due to its significant impact on the overall performance of parallel applications. With the current shift to many-core architectures, such as the Intel® Many Integrated Core Architecture, software barriers need to be revisited from an on-chip point of view to exploit their new specific resources. In this paper, we propose a tree-based barrier that takes advantage of SIMD instructions and the inter-thread cache locality provided by the 4-way SMT of the Intel® Xeon PhiTM coprocessor. Our SIMD approach shows a speed-up of up to 2.84x over the default Intel OpenMP* barrier in the EPCC barrier microbenchmark. It also improves by up to 60% and 21% the Livermore Loop kernel number six and the NAS MG benchmark, respectively.
CitationCaballero, D.; Duran, A.; Martorell, X. An OpenMP* barrier using SIMD instructions for Intel® Xeon Phi™ coprocessor. A: International Workshop on OpenMP. "OpenMP in the era of low power devices and accelerators: 9th International Workshop on OpenMP, IWOMP 2013: Canberra, ACT, Australia: September 2013: Proceedings". Canberra: Springer, 2013, p. 99-113.
|An OpenMP barri ... l Xeon Phi coprocessor.pdf||An OpenMP barrier using SIMD instructions for Intel Xeon Phi coprocessor||759,4Kb||Restricted access|