A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness

Cook, Henry; Moretó Planas, Miquel; Bird, Sarah L.; Dao, Khanh; Patterson, David; Asanovic, Krste

doi:10.1145/2485922.2485949

dc.contributor.author	Cook, Henry
dc.contributor.author	Moretó Planas, Miquel
dc.contributor.author	Bird, Sarah L.
dc.contributor.author	Dao, Khanh
dc.contributor.author	Patterson, David
dc.contributor.author	Asanovic, Krste
dc.contributor.other	Barcelona Supercomputing Center
dc.date.accessioned	2014-06-16T09:29:10Z
dc.date.created	2013
dc.date.issued	2013
dc.identifier.citation	Cook, H. [et al.]. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. A: Annual International Symposium on Computer Architecture. "ISCA 2013: the 40th Annual International Symposium on Computer Architecture: conference proceedings: June 23-27, 2013: Tel-Aviv, Israel". Tel-Aviv: ACM, 2013, p. 308-319.
dc.identifier.isbn	978-1-4503-2079-5
dc.identifier.uri	http://hdl.handle.net/2117/23225
dc.description.abstract	Computing workloads often contain a mix of interactive, latency-sensitive foreground applications and recurring background computations. To guarantee responsiveness, interactive and batch applications are often run on disjoint sets of resources, but this incurs additional energy, power, and capital costs. In this paper, we evaluate the potential of hardware cache partitioning mechanisms and policies to improve efficiency by allowing background applications to run simultaneously with interactive foreground applications, while avoiding degradation in interactive responsiveness. We evaluate these tradeoffs using commercial x86 multicore hardware that supports cache partitioning, and find that real hardware measurements with full applications provide different observations than past simulation-based evaluations. Co-scheduling applications without LLC partitioning leads to a 10% energy improvement and average throughput improvement of 54% compared to running tasks separately, but can result in foreground performance degradation of up to 34% with an average of 6%. With optimal static LLC partitioning, the average energy improvement increases to 12% and the average throughput improvement to 60%, while the worst case slowdown is reduced noticeably to 7% with an average slowdown of only 2%. We also evaluate a practical low-overhead dynamic algorithm to control partition sizes, and are able to realize the potential performance guarantees of the optimal static approach, while increasing background throughput by an additional 19%.
dc.description.sponsorship	We would especially like to thank everyone at Intel who made it possible for us to use the cache-partitioning machine in this paper, including Opher Kahn, Andrew Herdrich, Ravi Iyer, Gans Srinivasa, Mark Rowland, Ian Steiner and Henry Gabb. We would also like to Scott Beamer, Chris Celio, Shoaib Kamil, Leo Meyerovich, and David Sheeld for allowing us to study their applications. Additionally, we would like to thank our colleagues in the Par Lab for their continual advice, support, and, feedback. Research supported by Microsoft (Award 024263) and Intel (Award 024894) funding and by matching funding by U.C. Discovery (Award DIG07-10227). Additional support comes from ParLab aliates Nokia, NVIDIA, Oracle, and Samsung. M. Moreto was supported by the Spanish Ministry of Science under contract TIN2012-34557, a MEC/Fulbright Fellowship, and by an AGAUR award (BE-DGR 2010)
dc.format.extent	12 p.
dc.language.iso	eng
dc.publisher	ACM
dc.subject	Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcsh	Cache memory
dc.subject.other	Hardware
dc.subject.other	Computer architecture
dc.subject.other	Optimization
dc.subject.other	Throughput
dc.title	A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness
dc.type	Conference report
dc.subject.lemac	Memòria cau
dc.contributor.group	Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi	10.1145/2485922.2485949
dc.rights.access	Open Access
local.identifier.drac	12790911
dc.description.version	Postprint (published version)
dc.relation.projectid	info:eu-repo/grantAgreement/EC/FP7/321253/EU/Riding on Moore's Law/ROMOL
dc.relation.projectid	info:eu-repo/grantAgreement/EC/FP7/287759/EU/High Performance and Embedded Architecture and Compilation/HIPEAC
dc.date.lift	10000-01-01
local.citation.author	Cook, H.; Moreto, M.; Bird, S.; Dao, K.; Patterson, D.; Asanovic, K.
local.citation.contributor	Annual International Symposium on Computer Architecture
local.citation.pubplace	Tel-Aviv
local.citation.publicationName	ISCA 2013: the 40th Annual International Symposium on Computer Architecture: conference proceedings: June 23-27, 2013: Tel-Aviv, Israel
local.citation.startingPage	308
local.citation.endingPage	319

Fitxers d'aquest items

Nom:: A Hardware Evaluation of Cache ...
Mida:: 1,443Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [574]
Ponències/Comunicacions de congressos [784]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora