Future vector microprocessor extensions for data aggregations

Hayes, Timothy; Palomar, Oscar; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Valero Cortés, Mateo

doi:10.1109/ISCA.2016.44

dc.contributor.author	Hayes, Timothy
dc.contributor.author	Palomar, Oscar
dc.contributor.author	Unsal, Osman Sabri
dc.contributor.author	Cristal Kestelman, Adrián
dc.contributor.author	Valero Cortés, Mateo
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.other	Barcelona Supercomputing Center
dc.date.accessioned	2016-10-10T08:29:09Z
dc.date.available	2016-10-10T08:29:09Z
dc.date.issued	2016
dc.identifier.citation	Hayes, T., Palomar, O., Unsal, O., Cristal, A., Valero, M. Future vector microprocessor extensions for data aggregations. A: Annual International Symposium on Computer Architecture. "43rd International Symposium on Computer Architecture, ISCA 2016: 18-22 June 2016, Seoul, South Korea: proceedings". Seul: Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 418-430.
dc.identifier.isbn	978-1-4673-8947-1
dc.identifier.uri	http://hdl.handle.net/2117/90618
dc.description.abstract	As the rate of annual data generation grows exponentially, there is a demand to aggregate and summarise vast amounts of information quickly. In the past, frequency scaling was relied upon to push application throughput. Today, Dennard scaling has ceased and further performance must come from exploiting parallelism. Single instruction-multiple data (SIMD) instruction sets offer a highly efficient and scalable way of exploiting data-level parallelism (DLP). While microprocessors originally offered very simple SIMD support targeted at multimedia applications, these extensions have been growing both in width and functionality. Observing this trend, we use a simulation framework to model future SIMD support and then propose and evaluate five different ways of vectorising data aggregation. We find that although data aggregation is abundant in DLP, it is often too irregular to be expressed efficiently using typical SIMD instructions. Based on this observation, we propose a set of novel algorithms and SIMD instructions to better capture this irregular DLP. Furthermore, we discover that the best algorithm is highly dependent on the characteristics of the input. Our proposed solution can dynamically choose the optimal algorithm in the majority of cases and achieves speedups between 2.7x and 7.6x over a scalar baseline.
dc.description.sponsorship	The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA no 321253 and is supported in part by the European Union (FEDER funds) under contract TTIN2015-65316-P. Timothy Hayes is supported by a FPU research grant from the Spanish MECD.
dc.format.extent	13 p.
dc.language.iso	eng
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.subject	Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcsh	Parallel processing (Electronic computers)
dc.subject.lcsh	Microprocessors
dc.subject.other	Registers
dc.subject.other	Support vector machines
dc.subject.other	Instruction sets
dc.subject.other	Data models
dc.title	Future vector microprocessor extensions for data aggregations
dc.type	Conference lecture
dc.subject.lemac	Processament en paral·lel (Ordinadors)
dc.subject.lemac	Microprocessadors
dc.contributor.group	Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi	10.1109/ISCA.2016.44
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	http://ieeexplore.ieee.org/document/7551411/
dc.rights.access	Open Access
local.identifier.drac	18819475
dc.description.version	Postprint (published version)
dc.relation.projectid	info:eu-repo/grantAgreement/EC/FP7/321253/EU/Riding on Moore's Law/ROMOL
dc.relation.projectid	info:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
local.citation.author	Hayes, T.; Palomar, O.; Unsal, O.; Cristal, A.; Valero, M.
local.citation.contributor	Annual International Symposium on Computer Architecture
local.citation.pubplace	Seul
local.citation.publicationName	43rd International Symposium on Computer Architecture, ISCA 2016: 18-22 June 2016, Seoul, South Korea: proceedings
local.citation.startingPage	418
local.citation.endingPage	430

Fitxers d'aquest items

Nom:: Future+Vector+Microprocessor+E ...
Mida:: 2,059Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [574]
Ponències/Comunicacions de congressos [784]
Ponències/Comunicacions de congressos [1.954]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Future vector microprocessor extensions for data aggregations

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora