Mostra el registre d'ítem simple

dc.contributor.authorArmejach Sanosa, Adrià
dc.contributor.authorCaminal Pallarés, Helena
dc.contributor.authorCebrián González, Juan Manuel
dc.contributor.authorLangarita, Rubén
dc.contributor.authorGonzález-Alberquilla, Rekai
dc.contributor.authorAdeniyi-Jones, Chris
dc.contributor.authorValero Cortés, Mateo
dc.contributor.authorCasas, Marc
dc.contributor.authorMoretó Planas, Miquel
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned2019-10-08T07:00:16Z
dc.date.available2020-04-08T00:25:58Z
dc.date.issued2020-03
dc.identifier.citationArmejach, A. [et al.]. Using Arm’s scalable vector extension on stencil codes. "Journal of supercomputing", vol. 76, Març 2020, p. 2039-2062.
dc.identifier.issn0920-8542
dc.identifier.urihttp://hdl.handle.net/2117/169340
dc.description.abstractData-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabilities, it can provide substantial performance improvements on top of widely used techniques such as thread-level parallelism. However, manual vectorization is a tedious and costly process that needs to be repeated for each specific instruction set or register size. In addition, automatic compiler vectorization is susceptible to code complexity, and usually limited due to data and control dependencies. To address some of these issues, Arm recently released a new vector ISA, the scalable vector extension (SVE), which is vector-length agnostic (VLA). VLA enables the generation of binary files that run regardless of the physical vector register length. In this paper, we leverage the main characteristics of SVE to implement and optimize stencil computations, ubiquitous in scientific computing. We show that SVE enables easy deployment of textbook optimizations like loop unrolling, loop fusion, load trading or data reuse. Our detailed simulations using vector lengths ranging from 128 to 2048 bits show that these optimizations can lead to performance improvements over straightforward vectorized code of up to 1.57×. In addition, we show that certain optimizations can hurt performance due to reduced arithmetic intensity and instruction overheads, and provide insight useful for compiler optimizers.
dc.format.extent24 p.
dc.language.isoeng
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshUbiquitous computing
dc.subject.lcshCompilers (Computer programs)
dc.subject.otherData-level parallelism
dc.subject.otherScalable vector extension
dc.subject.otherVector-length agnostic
dc.subject.otherStencil computations
dc.titleUsing Arm’s scalable vector extension on stencil codes
dc.typeArticle
dc.subject.lemacInformàtica ubiqua
dc.subject.lemacCompiladors (Programes d'ordinador)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1007/s11227-019-02842-5
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://link.springer.com/article/10.1007/s11227-019-02842-5
dc.rights.accessOpen Access
local.identifier.drac25169572
dc.description.versionPostprint (author's final draft)
dc.relation.projectidinfo:eu-repo/grantAgreement/AEI/RYC-2016-21104
dc.relation.projectidinfo:eu-repo/grantAgreement/AGAUR/2017 SGR 1414
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
local.citation.authorArmejach, A.; Caminal, H.; Cebrián, J. M.; Langarita, R.; González-Alberquilla, R.; Adeniyi-Jones, C.; Valero, M.; Casas, M.; Moreto, M.
local.citation.publicationNameJournal of supercomputing
local.citation.volume76
local.citation.startingPage2039
local.citation.endingPage2062


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple