Efficient power, performance and thermal aware strategies over heterogeneous platforms
Document typeMaster thesis
Rights accessOpen Access
Resource Management is a widely studied field in computer science and of utmost importance for the adequate operation of data center infrastructures. Efficient resource management policies enable to improve the energy consumption of these facilities, thus reducing operational costs. Furthermore, in High Performance Computing (HPC) environment, as is the case of the MANGO H2020 project, allow to improve performance and execution time of applications. The main objective of this project is the design, implementation and test of a resource manager able to allocate incoming applications to the different servers of the data center, while providing the necessary tools to deploy power, performance and thermal aware policies over an heterogeneous cluster. This cluster, will be composed by regular Intel based servers and FPGA based accelerators. The resource manager will work as a single entry point for all the applications involved in MANGO project. By the end of the project, we have shown how applying simple yet effective allocation policies without controlling fine grain accelerators and with and overview of the system it is possible to improve performance by 10% by lowering power and temperature and reducing the above mentioned operational costs. The resource management tool developed in this MSc thesis has been deployed in a real prototype infrastructure composed by 8 and 128 FPGAs.
This project aims the implementation of efficient resource manager strategies in a local platform which includes a server and a FPGA and a testing phase where the implementation will be deployed over a 16 server cluster with 16 FPGA each. In order to achieve optimal solutions a resource manager (SLURM) in addition to other services will be deployed as first step. This other services will conform all the necessary plugins to connect SLURM with the current local resource manager.