Stencil codes on a vector length agnostic architecture

Armejach Sanosa, Adrià; Caminal Pallarés, Helena; Cebrián González, Juan Manuel; González-Alberquilla, Rekai; Adeniyi-Jones, Chris; Valero Cortés, Mateo; Casas, Marc; Moretó Planas, Miquel

doi:10.1145/3243176.3243192

dc.contributor.author	Armejach Sanosa, Adrià
dc.contributor.author	Caminal Pallarés, Helena
dc.contributor.author	Cebrián González, Juan Manuel
dc.contributor.author	González-Alberquilla, Rekai
dc.contributor.author	Adeniyi-Jones, Chris
dc.contributor.author	Valero Cortés, Mateo
dc.contributor.author	Casas, Marc
dc.contributor.author	Moretó Planas, Miquel
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.other	Barcelona Supercomputing Center
dc.date.accessioned	2018-12-04T12:50:13Z
dc.date.available	2018-12-04T12:50:13Z
dc.date.issued	2018
dc.identifier.citation	Armejach, A., Caminal, H., Cebrián, J.M., González-Alberquilla, R., Adeniyi-Jones, C., Valero, M., Casas, M., Moreto, M. Stencil codes on a vector length agnostic architecture. A: International Conference on Parallel Architectures and Compilation Techniques. "Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques: Limassol, Cyprus, November 01-04, 2018". New York: Association for Computing Machinery (ACM), 2018, p. 1-12.
dc.identifier.isbn	978-1-4503-5986-3
dc.identifier.uri	http://hdl.handle.net/2117/125368
dc.description.abstract	Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabilities, it can provide substantial performance improvements on top of widely used techniques such as thread-level parallelism. However, manual vectorization is a tedious and costly process that needs to be repeated for each specific instruction set or register size. In addition, automatic compiler vectorization is susceptible to code complexity, and usually limited due to data and control dependencies. To address some these issues, Arm recently released a new vector ISA, the Scalable Vector Extension (SVE), which is Vector-Length Agnostic (VLA). VLA enables the generation of binary files that run regardless of the physical vector register length. In this paper we leverage the main characteristics of SVE to implement and optimize stencil computations, ubiquitous in scientific computing. We show that SVE enables easy deployment of textbook optimizations like loop unrolling, loop fusion, load trading or data reuse. Our detailed simulations using vector lengths ranging from 128 to 2,048 bits show that these optimizations can lead to performance improvements over straight-forward vectorized code of up to 56.6% for 2,048 bit vectors. In addition, we show that certain optimizations can hurt performance due to a reduction in arithmetic intensity, and provide insight useful for compiler optimizers.
dc.description.sponsorship	This work has been partially supported by the European HiPEAC Network of Excellence, by the Spanish Ministry of Economy and Competitiveness (contract TIN2015-65316-P), and by the Generalitat de Catalunya (contracts 2017-SGR-1328 and 2017-SGR-1414). The Mont-Blanc project receives funding from the EUs H2020 Framework Programme (H2020/2014-2020) under grant agreements no. 671697 and no. 779877. M. Moreto has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship number RYC-2016-21104. Finally, A. Armejach has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Juan de la Cierva postdoctoral fellowship number FJCI-2015-24753.
dc.format.extent	12 p.
dc.language.iso	eng
dc.publisher	Association for Computing Machinery (ACM)
dc.subject	Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles
dc.subject.lcsh	Parallel processing (Electronic computers)
dc.subject.other	Single instruction
dc.subject.other	Multiple data
dc.subject.other	Parallel computing models
dc.subject.other	Data-level parallelism
dc.subject.other	Scalable vector extension
dc.subject.other	Vector length agnostic
dc.subject.other	Stencil computations
dc.title	Stencil codes on a vector length agnostic architecture
dc.type	Conference report
dc.subject.lemac	Processament en paral·lel (Ordinadors)
dc.contributor.group	Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi	10.1145/3243176.3243192
dc.description.peerreviewed	Peer Reviewed
dc.relation.publisherversion	https://dl.acm.org/citation.cfm?id=3243192
dc.rights.access	Open Access
local.identifier.drac	23533394
dc.description.version	Postprint (author's final draft)
dc.relation.projectid	info:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
dc.relation.projectid	info:eu-repo/grantAgreement/AEI/RYC-2016-21104
dc.relation.projectid	info:eu-repo/grantAgreement/AGAUR/2017 SGR 1414
dc.relation.projectid	info:eu-repo/grantAgreement/EC/H2020/671697/EU/Mont-Blanc 3, European scalable and power efficient HPC platform based on low-power embedded technology/Mont-Blanc 3
dc.relation.projectid	info:eu-repo/grantAgreement/EC/H2020/779877/EU/Mont-Blanc 2020, European scalable, modular and power efficient HPC processor/Mont-Blanc 2020
dc.relation.projectid	info:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
local.citation.author	Armejach, A.; Caminal, H.; Cebrián, J.M.; González-Alberquilla, R.; Adeniyi-Jones, C.; Valero, M.; Casas, M.; Moreto, M.
local.citation.contributor	International Conference on Parallel Architectures and Compilation Techniques
local.citation.pubplace	New York
local.citation.publicationName	Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques: Limassol, Cyprus, November 01-04, 2018
local.citation.startingPage	1
local.citation.endingPage	12

Fitxers d'aquest items

Nom:: Stencil Codes on a Vector Length ...
Mida:: 1,070Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Ponències/Comunicacions de congressos [574]
Ponències/Comunicacions de congressos [784]
Ponències/Comunicacions de congressos [1.954]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Stencil codes on a vector length agnostic architecture

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora