Runtime-guided management of stacked DRAM memories in task parallel programs

Document typeConference report
Defense date2018
PublisherAssociation for Computing Machinery (ACM)
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
ProjectCOMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
ROMOL - Riding on Moore's Law (EC-FP7-321253)
Mont-Blanc 2020 - Mont-Blanc 2020, European scalable, modular and power efficient HPC processor (EC-H2020-779877)
COMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
ROMOL - Riding on Moore's Law (EC-FP7-321253)
Mont-Blanc 2020 - Mont-Blanc 2020, European scalable, modular and power efficient HPC processor (EC-H2020-779877)
COMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
Abstract
Stacked DRAM memories have become a reality in High-Performance Computing (HPC) architectures. These memories provide much higher bandwidth while consuming less power than traditional off-chip memories, but their limited memory capacity is insufficient for modern HPC systems. For this reason, both stacked DRAM and off-chip memories are expected to co-exist in HPC architectures, giving raise to different approaches for architecting the stacked DRAM in the system. This paper proposes a runtime approach to transparently manage stacked DRAM memories in task-based programming models. In this approach the runtime system is in charge of copying the data accessed by the tasks to the stacked DRAM, without any complex hardware support nor modifications to the application code. To mitigate the cost of copying data between the stacked DRAM and the off-chip memory, the proposal includes an optimization to parallelize the copies across idle or additional helper threads. In addition, the runtime system is aware of the reuse pattern of the data accessed by the tasks, and can exploit this information to avoid unworthy copies of data to the stacked DRAM. Results on the Intel Knights Landing processor show that the proposed techniques achieve an average speedup of 14% against the state-of-the-art library to manage the stacked DRAM and 29% against a stacked DRAM architected as a hardware cache.
CitationÁlvarez, L., Casas, M., Labarta, J., Ayguadé, E., Valero, M., Moreto, M. Runtime-guided management of stacked DRAM memories in task parallel programs. A: International Conference on Supercomputing. "Proceedings of the 2018 International Conference on Supercomputing: Beijing, China, June 12-15, 2018". New York: Association for Computing Machinery (ACM), 2018, p. 218-228.
ISBN978-1-4503-5783-8
Publisher versionhttps://dl.acm.org/citation.cfm?id=3205312
Files | Description | Size | Format | View |
---|---|---|---|---|
Runtime-Guided+Management+of+Stacked+DRAM.pdf | 835,2Kb | View/Open |