Characterizing the impact of last-level cache replacement policies on big-data workloads
Document typeConference report
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessOpen Access
ProjectMont-Blanc 2020 - Mont-Blanc 2020, European scalable, modular and power efficient HPC processor (EC-H2020-779877)
The vast disparity between Last Level Cache (LLC) and memory latencies has motivated the need for efficient cache management policies. The computer architecture literature abounds with work on LLC replacement policy. Although these works greatly improve over the least-recently-used (LRU) policy, they tend to focus only on the SPEC CPU 2006 benchmark suite - and more recently on the SPEC CPU 2017 benchmark suite - for evaluation. However, these workloads are representative for only a subset of current High-Performance Computing (HPC) workloads. In this paper we evaluate the behavior of a mix of graph processing, scientific and industrial workloads (GAP, XSBench and Qualcomm) along with the well-known SPEC CPU 2006 and SPEC CPU 2017 workloads on state-of-the-art LLC replacement policies such as Multiperspective Reuse Prediction (MPPPB), Glider, Hawkeye, SHiP, DRRIP and SRRIP. Our evaluation reveals that, even though current state-of-the-art LLC replacement policies provide a significant performance improvement over LRU for both SPEC CPU 2006 and SPEC CPU 2017 workloads, those policies are hardly able to capture the access patterns and yield sensible improvement on current HPC and big data workloads due to their highly complex behavior. In addition, this paper introduces two new LLC replacement policies derived from MPPPB. The first proposed replacement policy, Multi-Sampler Multiperspective (MS-MPPPB), uses multiple samplers instead of a single one and dynamically selects the best-behaving sampler to drive reuse distance predictions. The second replacement policy presented in this paper, Multiperspective with Dynamic Features Selector (DS-MPPPB), selects the best behaving features among a set of 64 features to improve the accuracy of the predictions. On a large set of workloads that stress the LLC, MS-MPPPB achieves a geometric mean speed-up of 8.3% over LRU, while DS-MPPPB outperforms LRU by a geometric mean speedup of 8.0%. For big data and HPC workloads, the two proposed techniques present higher performance benefits than state-of-the-art approaches such as MPPPB, Glider and Hawkeye, which yield geometric mean speedups of 7.0%, 5.0% and 4.8% over LRU, respectively.
CitationJamet, A. V. [et al.]. Characterizing the impact of last-level cache replacement policies on big-data workloads. A: IEEE International Symposium on Workload Characterization. "2020 IEEE International Symposium on Workload Characterization: 27–29 October 2020, online event: proceedings". Institute of Electrical and Electronics Engineers (IEEE), 2020, p. 134-144. ISBN 978-1-7281-7645-1. DOI 10.1109/IISWC50251.2020.00022.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder