An integrated vector-scalar design on an in-order ARM core

Stanic, Milan; Palomar Pérez, Óscar; Hayes, Timothy; Ratkovic, Ivan; Cristal Kestelman, Adrián; Unsal, Osman Sabri; Valero Cortés, Mateo

doi:10.1145/3075618

dc.contributor.author	Stanic, Milan
dc.contributor.author	Palomar Pérez, Óscar
dc.contributor.author	Hayes, Timothy
dc.contributor.author	Ratkovic, Ivan
dc.contributor.author	Cristal Kestelman, Adrián
dc.contributor.author	Unsal, Osman Sabri
dc.contributor.author	Valero Cortés, Mateo
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.other	Barcelona Supercomputing Center
dc.date.accessioned	2017-07-21T07:53:17Z
dc.date.available	2017-07-21T07:53:17Z
dc.date.issued	2017-07
dc.identifier.citation	Stanic, M., Palomar, Ó., Hayes, T., Ratkovic, I., Cristal, A., Unsal, O., Valero, M. An integrated vector-scalar design on an in-order ARM core. "ACM transactions on architecture and code optimization", Juliol 2017, vol. 14, núm. 2, p. 17:1-17:26.
dc.identifier.issn	1544-3566
dc.identifier.uri	http://hdl.handle.net/2117/106671
dc.description.abstract	In the low-end mobile processor market, power, energy, and area budgets are significantly lower than in the server/desktop/laptop/high-end mobile markets. It has been shown that vector processors are a highly energy-efficient way to increase performance; however, adding support for them incurs area and power overheads that would not be acceptable for low-end mobile processors. In this work, we propose an integrated vector-scalar design for the ARM architecture that mostly reuses scalar hardware to support the execution of vector instructions. The key element of the design is our proposed block-based model of execution that groups vector computational instructions together to execute them in a coordinated manner. We implemented a classic vector unit and compare its results against our integrated design. Our integrated design improves the performance (more than 6×) and energy consumption (up to 5×) of a scalar in-order core with negligible area overhead (only 4.7% when using a vector register with 32 elements). In contrast, the area overhead of the classic vector unit can be significant (around 44%) if a dedicated vector floating-point unit is incorporated. Our block-based vector execution outperforms the classic vector unit for all kernels with floating-point data and also consumes less energy. We also complement the integrated design with three energy/performance-efficient techniques that further reduce power and increase performance. The first proposal covers the design and implementation of chaining logic that is optimized to work with the cache hierarchy through vector memory instructions, the second proposal reduces the number of reads/writes from/to the vector register file, and the third idea optimizes complex memory access patterns with the memory shape instruction and unified indexed vector load.
dc.description.sponsorship	The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA no 321253 and is supported in part by the European Union (FEDER funds) under contract TIN2015-65316-P. This research has been also supported the Agency for Management of University and Research Grants (AGAUR - FI-DGR 2014). O. Palomar is funded by a Royal Society Newton International Fellowship.
dc.language.iso	eng
dc.subject	Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcsh	Integrated circuits -- Design and construction
dc.subject.lcsh	Parallel programming (Computer science)
dc.subject.other	Computer systems organization
dc.subject.other	Single instruction
dc.subject.other	Multiple data
dc.subject.other	Vector processors
dc.subject.other	Low-power
dc.subject.other	Energy efficiency
dc.subject.other	Mobile processors
dc.title	An integrated vector-scalar design on an in-order ARM core
dc.type	Article
dc.subject.lemac	Circuits integrats -- Disseny i construcció
dc.subject.lemac	Programació en paral·lel (Informàtica)
dc.contributor.group	Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi	10.1145/3075618
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	http://dl.acm.org/citation.cfm?id=3075618
dc.rights.access	Open Access
local.identifier.drac	21186138
dc.description.version	Postprint (author's final draft)
dc.relation.projectid	info:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
dc.relation.projectid	info:eu-repo/grantAgreement/EC/FP7/321253/EU/Riding on Moore's Law/ROMOL
dc.relation.projectid	info:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
local.citation.author	Stanic, M.; Palomar, Ó.; Hayes, T.; Ratkovic, I.; Cristal, A.; Unsal, O.; Valero, M.
local.citation.publicationName	ACM transactions on architecture and code optimization
local.citation.volume	14
local.citation.number	2
local.citation.startingPage	17:1
local.citation.endingPage	17:26

Fitxers d'aquest items

Nom:: An Integrated Vector-Scalar ...
Mida:: 1,390Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

An integrated vector-scalar design on an in-order ARM core

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora