Show simple item record

dc.contributor.authorMilic, Ugljesa
dc.contributor.authorCarpenter, Paul Matthew
dc.contributor.authorRico, Alejandro
dc.contributor.authorRamirez, Alex
dc.contributor.otherBarcelona Supercomputing Center
dc.date.accessioned2016-10-20T08:06:34Z
dc.date.available2016-10-20T08:06:34Z
dc.date.issued2016-09-25
dc.identifier.citationMilic, Ugljesa [et al.]. Rebalancing the core front-end through HPC code analysis. A: "2016 IEEE International Symposium on Workload Characterization (IISWC) (2016)". IEEE, 2016, p. 1-16.
dc.identifier.isbn978-1-5090-3897-8
dc.identifier.urihttp://hdl.handle.net/2117/90903
dc.description.abstractThere is a need to increase performance under the same power and area envelope to achieve Exascale technology in high performance computing (HPC). The today's chip multiprocessor (CMP) design is tailored by traditional desktop and server workloads, different from parallel applications commonly run in HPC. In this work, we focus on the HPC code characteristics and processor front-end which factors around 30% of core power and area on the emerging lean-core type of processors used in HPC. Separating serial from parallel code sections inside applications, we characterize three HPC benchmark suites and compare them to a traditional set of desktop integer workloads. HPC applications have biased and mostly backward taken branches, small dynamic instruction footprints, and long basic blocks. Our findings suggest smaller branch predictors (BP) with the additional loop BP, smaller branch target buffers (BTB), and smaller L1 instruction caches (I-cache) with wider lines. Still, the aforementioned downsizing applies only to the cores meant to run parallel code. The difference between serial and parallel code sections in HPC applications points to an asymmetric CMP design, with one baseline core for sequential and many HPCtailored cores designed for parallel code. Predictions using Sniper simulator and McPAT show that an HPC-tailored lean core saves 16% of the core area and 7% of power compared to a baseline core, without performance loss. Using the area savings to add an extra core, an asymmetric CMP with one baseline and eight tailored cores has the same area budget as a symmetric CMP composed out of eight baseline cores demanding 4% more power and providing 12% shorter execution time on average.
dc.format.extent10 p.
dc.language.isoeng
dc.publisherIEEE
dc.subjectÀrees temàtiques de la UPC::Enginyeria electrònica
dc.subject.lcshSupercomputers--Programming
dc.subject.lcshBenchMark Engineers
dc.subject.otherHigh performance computing (HPC)
dc.subject.otherBenchmark testing
dc.subject.otherMulticore processing
dc.subject.otherInstruments
dc.subject.otherElectric breakdown
dc.subject.otherServers
dc.titleRebalancing the core front-end through HPC code analysis
dc.typeConference lecture
dc.subject.lemacSupercomputadors
dc.identifier.doi10.1109/IISWC.2016.7581273
dc.relation.publisherversionhttps://www.computer.org/csdl/proceedings/iiswc/2016/3896/00/07581273-abs.html
dc.rights.accessOpen Access
dc.description.versionPostprint (author's final draft)
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
dc.relation.projectidinfo:eu-repo/grantAgreement/ES/1PE/TIN2012-34557
local.citation.publicationName2016 IEEE International Symposium on Workload Characterization (IISWC) (2016)
local.citation.startingPage1
local.citation.endingPage16


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record