Stencil codes on a vector length agnostic architecture

Armejach Sanosa, Adrià; Caminal Pallarés, Helena; Cebrián González, Juan Manuel; González-Alberquilla, Rekai; Adeniyi-Jones, Chris; Valero Cortés, Mateo; Casas, Marc; Moretó Planas, Miquel

doi:10.1145/3243176.3243192

Visualitza/Obre

Stencil Codes on a Vector Length Agnostic Architecture.pdf (1,070Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Armejach Sanosa, Adrià

Caminal Pallarés, Helena

Cebrián González, Juan Manuel

González-Alberquilla, Rekai

Adeniyi-Jones, Chris

Valero Cortés, Mateo

Casas, Marc

Moretó Planas, Miquel

Tipus de documentText en actes de congrés

Data publicació2018

EditorAssociation for Computing Machinery (ACM)

Condicions d'accésAccés obert

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

ProjecteCOMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)
Mont-Blanc 3 - Mont-Blanc 3, European scalable and power efficient HPC platform based on low-power embedded technology (EC-H2020-671697)
Mont-Blanc 2020 - Mont-Blanc 2020, European scalable, modular and power efficient HPC processor (EC-H2020-779877)
COMPUTACION DE ALTAS PRESTACIONES VII (MINECO-TIN2015-65316-P)

Abstract

Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabilities, it can provide substantial performance improvements on top of widely used techniques such as thread-level parallelism. However, manual vectorization is a tedious and costly process that needs to be repeated for each specific instruction set or register size. In addition, automatic compiler vectorization is susceptible to code complexity, and usually limited due to data and control dependencies. To address some these issues, Arm recently released a new vector ISA, the Scalable Vector Extension (SVE), which is Vector-Length Agnostic (VLA). VLA enables the generation of binary files that run regardless of the physical vector register length. In this paper we leverage the main characteristics of SVE to implement and optimize stencil computations, ubiquitous in scientific computing. We show that SVE enables easy deployment of textbook optimizations like loop unrolling, loop fusion, load trading or data reuse. Our detailed simulations using vector lengths ranging from 128 to 2,048 bits show that these optimizations can lead to performance improvements over straight-forward vectorized code of up to 56.6% for 2,048 bit vectors. In addition, we show that certain optimizations can hurt performance due to a reduction in arithmetic intensity, and provide insight useful for compiler optimizers.

CitacióArmejach, A., Caminal, H., Cebrián, J.M., González-Alberquilla, R., Adeniyi-Jones, C., Valero, M., Casas, M., Moreto, M. Stencil codes on a vector length agnostic architecture. A: International Conference on Parallel Architectures and Compilation Techniques. "Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques: Limassol, Cyprus, November 01-04, 2018". New York: Association for Computing Machinery (ACM), 2018, p. 1-12.

URIhttp://hdl.handle.net/2117/125368

DOI10.1145/3243176.3243192

ISBN978-1-4503-5986-3

Versió de l'editorhttps://dl.acm.org/citation.cfm?id=3243192

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
Stencil Codes o ... Agnostic Architecture.pdf		1,070Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Stencil codes on a vector length agnostic architecture

Visualitza/Obre

Explora