Future vector microprocessor extensions for data aggregations

Hayes, Timothy; Palomar, Oscar; Unsal, Osman Sabri; Cristal Kestelman, Adrián; Valero Cortés, Mateo

doi:10.1109/ISCA.2016.44

Visualitza/Obre

Future+Vector+Microprocessor+Extensions.pdf (2,059Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Hayes, Timothy

Palomar, Oscar

Unsal, Osman Sabri

Cristal Kestelman, Adrián

Valero Cortés, Mateo

Tipus de documentComunicació de congrés

Data publicació2016

EditorInstitute of Electrical and Electronics Engineers (IEEE)

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

ProjecteROMOL - Riding on Moore's Law (EC-FP7-321253)
COMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)

Abstract

As the rate of annual data generation grows exponentially, there is a demand to aggregate and summarise vast amounts of information quickly. In the past, frequency scaling was relied upon to push application throughput. Today, Dennard scaling has ceased and further performance must come from exploiting parallelism. Single instruction-multiple data (SIMD) instruction sets offer a highly efficient and scalable way of exploiting data-level parallelism (DLP). While microprocessors originally offered very simple SIMD support targeted at multimedia applications, these extensions have been growing both in width and functionality. Observing this trend, we use a simulation framework to model future SIMD support and then propose and evaluate five different ways of vectorising data aggregation. We find that although data aggregation is abundant in DLP, it is often too irregular to be expressed efficiently using typical SIMD instructions. Based on this observation, we propose a set of novel algorithms and SIMD instructions to better capture this irregular DLP. Furthermore, we discover that the best algorithm is highly dependent on the characteristics of the input. Our proposed solution can dynamically choose the optimal algorithm in the majority of cases and achieves speedups between 2.7x and 7.6x over a scalar baseline.

CitacióHayes, T., Palomar, O., Unsal, O., Cristal, A., Valero, M. Future vector microprocessor extensions for data aggregations. A: Annual International Symposium on Computer Architecture. "43rd International Symposium on Computer Architecture, ISCA 2016: 18-22 June 2016, Seoul, South Korea: proceedings". Seul: Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 418-430.

URIhttp://hdl.handle.net/2117/90618

DOI10.1109/ISCA.2016.44

ISBN978-1-4673-8947-1

Versió de l'editorhttp://ieeexplore.ieee.org/document/7551411/

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
Future+Vector+Microprocessor+Extensions.pdf		2,059Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Future vector microprocessor extensions for data aggregations

Visualitza/Obre

Explora