Large-memory nodes for energy efficient high-performance computing
Document typeConference report
PublisherAssociation for Computing Machinery (ACM)
Rights accessOpen Access
European Commission's projectExaNoDe - European Exascale Processor Memory Node Design (EC-H2020-671578)
Energy consumption is by far the most important contributor to HPC cluster operational costs, and it accounts for a significant share of the total cost of ownership. Advanced energy-saving techniques in HPC components have received significant research and development effort, but a simple measure that can dramatically reduce energy consumption is often overlooked. We show that, in capacity computing, where many small to medium-sized jobs have to be solved at the lowest cost, a practical energy-saving approach is to scale-in the application on large-memory nodes. We evaluate scaling-in; i.e. decreasing the number of application processes and compute nodes (servers) to solve a fixed-sized problem, using a set of HPC applications running in a production system. Using standard-memory nodes, we obtain average energy savings of 36%, already a huge figure. We show that the main source of these energy savings is a decrease in the node-hours (node_hours = #nodes x exe_time), which is a consequence of the more efficient use of hardware resources. Scaling-in is limited by the per-node memory capacity. We therefore consider using large-memory nodes to enable a greater degree of scaling-in. We show that the additional energy savings, of up to 52%, mean that in many cases the investment in upgrading the hardware would be recovered in a typical system lifetime of less than five years.
CitationZivanovic, D., Radulovic, M., Llort, G., Zaragoza, D., Strassburg, J., Carpenter, P., Radojkovic, P., Ayguade, E. Large-memory nodes for energy efficient high-performance computing. A: International Symposium on Memory Systems. "MEMSYS 2016: proceedings of the Second Intaernational Symposium on Memory Systems: Alexandria, VA, USA: October 03-06, 2016". Alexandria, VA: Association for Computing Machinery (ACM), 2016, p. 3-9.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder