DROM: Enabling Efficient and Effortless Malleability for Resource Managers
Cita com:
hdl:2117/120569
Document typeConference lecture
Defense date2018-08-13
PublisherAssociation for Computing Machinery (ACM)
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
ProjectCOMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
HBP SGA2 - Human Brain Project Specific Grant Agreement 2 (EC-H2020-785907)
HBP SGA2 - Human Brain Project Specific Grant Agreement 2 (EC-H2020-785907)
Abstract
In the design of future HPC systems, research in resource management is showing an increasing interest in a more dynamic control of the available resources. It has been proven that enabling the jobs to change the number of computing resources at run time, i.e. their malleability, can significantly improve HPC system performance. However, job schedulers and applications typically do not support malleability due to the common belief that it introduces additional programming complexity and performance impact. This paper presents DROM, an interface that provides efficient malleability with no effort for program developers. The running application is enabled to adapt the number of threads to the number of assigned computing resources in a completely transparent way to the user through the integration of DROM with standard programming models, such as OpenMP/OmpSs, and MPI. We designed the APIs to be easily used by any programming model, application and job scheduler or resource manager. Our experimental results from two realistic use cases analysis, based on malleability by reducing the number of cores a job is using per node and jobs co-allocation, show the potential of DROM for improving the performance of HPC systems. In particular, the workload of two MPI+OpenMP neuro-simulators are tested, reporting improvement in system metrics, such as total run time and average response time, up to 8% and 48%, respectively.
CitationD'Amico, M. [et al.]. DROM: Enabling Efficient and Effortless Malleability for Resource Managers. A: "ICPP '18 Proceedings of the 47th International Conference on Parallel Processing Companion". Association for Computing Machinery (ACM), 2018.
ISBN978-1-4503-6523-9
Publisher versionhttps://dl.acm.org/citation.cfm?id=3229752
Files | Description | Size | Format | View |
---|---|---|---|---|
DROM Enabling E ... tless Malleability for.pdf | 931,0Kb | View/Open |