Large-memory nodes for energy efficient high-performance computing
Document typeConference report
PublisherAssociation for Computing Machinery (ACM)
Rights accessOpen Access
European Commisision's projectExaNoDe - European Exascale Processor Memory Node Design (EC-H2020-671578)
Energy consumption is by far the most important contributor to HPC cluster operational costs, and it accounts for a significant share of the total cost of ownership. Advanced energy-saving techniques in HPC components have received significant research and development effort, but a simple measure that can dramatically reduce energy consumption is often overlooked. We show that, in capacity computing, where many small to medium-sized jobs have to be solved at the lowest cost, a practical energy-saving approach is to scale-in the application on large-memory nodes. We evaluate scaling-in; i.e. decreasing the number of application processes and compute nodes (servers) to solve a fixed-sized problem, using a set of HPC applications running in a production system. Using standard-memory nodes, we obtain average energy savings of 36%, already a huge figure. We show that the main source of these energy savings is a decrease in the node-hours (node_hours = #nodes x exe_time), which is a consequence of the more efficient use of hardware resources. Scaling-in is limited by the per-node memory capacity. We therefore consider using large-memory nodes to enable a greater degree of scaling-in. We show that the additional energy savings, of up to 52%, mean that in many cases the investment in upgrading the hardware would be recovered in a typical system lifetime of less than five years.
CitationZivanovic, D., Radulovic, M., Llort, G., Zaragoza, D., Strassburg, J., Carpenter, P., Radojkovic, P., Ayguade, E. Large-memory nodes for energy efficient high-performance computing. A: International Symposium on Memory Systems. "MEMSYS 2016: proceedings of the Second Intaernational Symposium on Memory Systems: Alexandria, VA, USA: October 03-06, 2016". Alexandria, VA: Association for Computing Machinery (ACM), 2016, p. 3-9.