Rebalancing the core front-end through HPC code analysis

Milic, Ugljesa; Carpenter, Paul Matthew; Rico, Alejandro; Ramirez, Alex

doi:10.1109/IISWC.2016.7581273

dc.contributor.author	Milic, Ugljesa
dc.contributor.author	Carpenter, Paul Matthew
dc.contributor.author	Rico, Alejandro
dc.contributor.author	Ramirez, Alex
dc.contributor.other	Barcelona Supercomputing Center
dc.date.accessioned	2017-01-31T10:50:46Z
dc.date.available	2017-01-31T10:50:46Z
dc.date.issued	2016-10-10
dc.identifier.citation	Milic, Ugljesa [et al.]. Rebalancing the core front-end through HPC code analysis. A: IEEE International Symposium on Workload Characterization (IISWC), 25-27 Sept. 2016. "Workload Characterization (IISWC), 2016 IEEE International Symposium on". IEEE, 2016, p. 128-137.
dc.identifier.isbn	978-1-5090-3896-1
dc.identifier.uri	http://hdl.handle.net/2117/100362
dc.description.abstract	There is a need to increase performance under the same power and area envelope to achieve Exascale technology in high performance computing (HPC). The today's chip multiprocessor (CMP) design is tailored by traditional desktop and server workloads, different from parallel applications commonly run in HPC. In this work, we focus on the HPC code characteristics and processor front-end which factors around 30% of core power and area on the emerging lean-core type of processors used in HPC. Separating serial from parallel code sections inside applications, we characterize three HPC benchmark suites and compare them to a traditional set of desktop integer workloads. HPC applications have biased and mostly backward taken branches, small dynamic instruction footprints, and long basic blocks. Our findings suggest smaller branch predictors (BP) with the additional loop BP, smaller branch target buffers (BTB), and smaller L1 instruction caches (I-cache) with wider lines. Still, the aforementioned downsizing applies only to the cores meant to run parallel code. The difference between serial and parallel code sections in HPC applications points to an asymmetric CMP design, with one baseline core for sequential and many HPCtailored cores designed for parallel code. Predictions using Sniper simulator and McPAT show that an HPC-tailored lean core saves 16% of the core area and 7% of power compared to a baseline core, without performance loss. Using the area savings to add an extra core, an asymmetric CMP with one baseline and eight tailored cores has the same area budget as a symmetric CMP composed out of eight baseline cores demanding 4% more power and providing 12% shorter execution time on average.
dc.description.sponsorship	The research was supported by European Unions 7th Framework Programme [FP7/2007-2013] under project Mont-Blanc (288777), the Ministry of Economy and Competitiveness of Spain (TIN2012-34557, TIN2015-65316-P, and BES-2013-063925), Generalitat de Catalunya (2014-SGR-1051 and 2014-SGR-1272), HiPEAC-3 Network of Excellence (ICT-287759), and finally the Severo Ochoa Program (SEV-2011-00067) of the Spanish Government.
dc.format.extent	10 p.
dc.language.iso	eng
dc.publisher	IEEE
dc.subject	Àrees temàtiques de la UPC::Enginyeria electrònica
dc.subject.lcsh	Microprocessors--Design and construction
dc.subject.lcsh	Parallel processing (Electronic computers)
dc.subject.lcsh	High performance computing
dc.subject.other	Benchmark testing
dc.subject.other	Multicore processing
dc.subject.other	Instruments
dc.subject.other	Electric breakdown
dc.subject.other	High performance computing
dc.subject.other	Servers
dc.title	Rebalancing the core front-end through HPC code analysis
dc.type	Conference report
dc.subject.lemac	Microprocessadors--Disseny i construcció
dc.identifier.doi	10.1109/IISWC.2016.7581273
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	http://ieeexplore.ieee.org/document/7581273/
dc.rights.access	Open Access
dc.description.version	Postprint (author's final draft)
dc.relation.projectid	info:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
dc.relation.projectid	info:eu-repo/grantAgreement/MINECO/1PE/TIN2012-34557
local.citation.contributor	IEEE International Symposium on Workload Characterization (IISWC), 25-27 Sept. 2016
local.citation.publicationName	Workload Characterization (IISWC), 2016 IEEE International Symposium on
local.citation.startingPage	128
local.citation.endingPage	137

Fitxers d'aquest items

Nom:: Rebalancing the Core Front-End ...
Mida:: 542,3Kb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [574]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Rebalancing the core front-end through HPC code analysis

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora