Show simple item record

dc.contributor.authorStanic, Milan
dc.contributor.authorPalomar Pérez, Óscar
dc.contributor.authorHayes, Timothy
dc.contributor.authorRatkovic, Ivan
dc.contributor.authorCristal Kestelman, Adrián
dc.contributor.authorUnsal, Osman Sabri
dc.contributor.authorValero Cortés, Mateo
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned2017-07-21T07:53:17Z
dc.date.available2017-07-21T07:53:17Z
dc.date.issued2017-07
dc.identifier.citationStanic, M., Palomar, Ó., Hayes, T., Ratkovic, I., Cristal, A., Unsal, O., Valero, M. An integrated vector-scalar design on an in-order ARM core. "ACM transactions on architecture and code optimization", Juliol 2017, vol. 14, núm. 2, p. 17:1-17:26.
dc.identifier.issn1544-3566
dc.identifier.urihttp://hdl.handle.net/2117/106671
dc.description.abstractIn the low-end mobile processor market, power, energy, and area budgets are significantly lower than in the server/desktop/laptop/high-end mobile markets. It has been shown that vector processors are a highly energy-efficient way to increase performance; however, adding support for them incurs area and power overheads that would not be acceptable for low-end mobile processors. In this work, we propose an integrated vector-scalar design for the ARM architecture that mostly reuses scalar hardware to support the execution of vector instructions. The key element of the design is our proposed block-based model of execution that groups vector computational instructions together to execute them in a coordinated manner. We implemented a classic vector unit and compare its results against our integrated design. Our integrated design improves the performance (more than 6×) and energy consumption (up to 5×) of a scalar in-order core with negligible area overhead (only 4.7% when using a vector register with 32 elements). In contrast, the area overhead of the classic vector unit can be significant (around 44%) if a dedicated vector floating-point unit is incorporated. Our block-based vector execution outperforms the classic vector unit for all kernels with floating-point data and also consumes less energy. We also complement the integrated design with three energy/performance-efficient techniques that further reduce power and increase performance. The first proposal covers the design and implementation of chaining logic that is optimized to work with the cache hierarchy through vector memory instructions, the second proposal reduces the number of reads/writes from/to the vector register file, and the third idea optimizes complex memory access patterns with the memory shape instruction and unified indexed vector load.
dc.description.sponsorshipThe research leading to these results has received funding from the RoMoL ERC Advanced Grant GA no 321253 and is supported in part by the European Union (FEDER funds) under contract TIN2015-65316-P. This research has been also supported the Agency for Management of University and Research Grants (AGAUR - FI-DGR 2014). O. Palomar is funded by a Royal Society Newton International Fellowship.
dc.language.isoeng
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshIntegrated circuits -- Design and construction
dc.subject.lcshParallel programming (Computer science)
dc.subject.otherComputer systems organization
dc.subject.otherSingle instruction
dc.subject.otherMultiple data
dc.subject.otherVector processors
dc.subject.otherLow-power
dc.subject.otherEnergy efficiency
dc.subject.otherMobile processors
dc.titleAn integrated vector-scalar design on an in-order ARM core
dc.typeArticle
dc.subject.lemacCircuits integrats -- Disseny i construcció
dc.subject.lemacProgramació en paral·lel (Informàtica)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1145/3075618
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttp://dl.acm.org/citation.cfm?id=3075618
dc.rights.accessOpen Access
drac.iddocument21186138
dc.description.versionPostprint (author's final draft)
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/1PE/TIN2015-65316-P
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/FP7/321253/EU/Riding on Moore's Law/ROMOL
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO/1PE/TIN2015-65316-P
upcommons.citation.authorStanic, M., Palomar, Ó., Hayes, T., Ratkovic, I., Cristal, A., Unsal, O., Valero, M.
upcommons.citation.publishedtrue
upcommons.citation.publicationNameACM transactions on architecture and code optimization
upcommons.citation.volume14
upcommons.citation.number2
upcommons.citation.startingPage17:1
upcommons.citation.endingPage17:26


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder