Mostra el registre d'ítem simple

dc.contributor.authorArmejach Sanosa, Adrià
dc.contributor.authorCaminal Pallarés, Helena
dc.contributor.authorCebrián González, Juan Manuel
dc.contributor.authorGonzález-Alberquilla, Rekai
dc.contributor.authorAdeniyi-Jones, Chris
dc.contributor.authorValero Cortés, Mateo
dc.contributor.authorCasas, Marc
dc.contributor.authorMoretó Planas, Miquel
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor.otherBarcelona Supercomputing Center
dc.date.accessioned2018-12-04T12:50:13Z
dc.date.available2018-12-04T12:50:13Z
dc.date.issued2018
dc.identifier.citationArmejach, A., Caminal, H., Cebrián, J.M., González-Alberquilla, R., Adeniyi-Jones, C., Valero, M., Casas, M., Moreto, M. Stencil codes on a vector length agnostic architecture. A: International Conference on Parallel Architectures and Compilation Techniques. "Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques: Limassol, Cyprus, November 01-04, 2018". New York: Association for Computing Machinery (ACM), 2018, p. 1-12.
dc.identifier.isbn978-1-4503-5986-3
dc.identifier.urihttp://hdl.handle.net/2117/125368
dc.description.abstractData-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabilities, it can provide substantial performance improvements on top of widely used techniques such as thread-level parallelism. However, manual vectorization is a tedious and costly process that needs to be repeated for each specific instruction set or register size. In addition, automatic compiler vectorization is susceptible to code complexity, and usually limited due to data and control dependencies. To address some these issues, Arm recently released a new vector ISA, the Scalable Vector Extension (SVE), which is Vector-Length Agnostic (VLA). VLA enables the generation of binary files that run regardless of the physical vector register length. In this paper we leverage the main characteristics of SVE to implement and optimize stencil computations, ubiquitous in scientific computing. We show that SVE enables easy deployment of textbook optimizations like loop unrolling, loop fusion, load trading or data reuse. Our detailed simulations using vector lengths ranging from 128 to 2,048 bits show that these optimizations can lead to performance improvements over straight-forward vectorized code of up to 56.6% for 2,048 bit vectors. In addition, we show that certain optimizations can hurt performance due to a reduction in arithmetic intensity, and provide insight useful for compiler optimizers.
dc.description.sponsorshipThis work has been partially supported by the European HiPEAC Network of Excellence, by the Spanish Ministry of Economy and Competitiveness (contract TIN2015-65316-P), and by the Generalitat de Catalunya (contracts 2017-SGR-1328 and 2017-SGR-1414). The Mont-Blanc project receives funding from the EUs H2020 Framework Programme (H2020/2014-2020) under grant agreements no. 671697 and no. 779877. M. Moreto has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship number RYC-2016-21104. Finally, A. Armejach has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Juan de la Cierva postdoctoral fellowship number FJCI-2015-24753.
dc.format.extent12 p.
dc.language.isoeng
dc.publisherAssociation for Computing Machinery (ACM)
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles
dc.subject.lcshParallel processing (Electronic computers)
dc.subject.otherSingle instruction
dc.subject.otherMultiple data
dc.subject.otherParallel computing models
dc.subject.otherData-level parallelism
dc.subject.otherScalable vector extension
dc.subject.otherVector length agnostic
dc.subject.otherStencil computations
dc.titleStencil codes on a vector length agnostic architecture
dc.typeConference report
dc.subject.lemacProcessament en paral·lel (Ordinadors)
dc.contributor.groupUniversitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi10.1145/3243176.3243192
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://dl.acm.org/citation.cfm?id=3243192
dc.rights.accessOpen Access
local.identifier.drac23533394
dc.description.versionPostprint (author's final draft)
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
dc.relation.projectidinfo:eu-repo/grantAgreement/AEI/RYC-2016-21104
dc.relation.projectidinfo:eu-repo/grantAgreement/AGAUR/2017 SGR 1414
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/H2020/671697/EU/Mont-Blanc 3, European scalable and power efficient HPC platform based on low-power embedded technology/Mont-Blanc 3
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/H2020/779877/EU/Mont-Blanc 2020, European scalable, modular and power efficient HPC processor/Mont-Blanc 2020
dc.relation.projectidinfo:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/
local.citation.authorArmejach, A.; Caminal, H.; Cebrián, J.M.; González-Alberquilla, R.; Adeniyi-Jones, C.; Valero, M.; Casas, M.; Moreto, M.
local.citation.contributorInternational Conference on Parallel Architectures and Compilation Techniques
local.citation.pubplaceNew York
local.citation.publicationNameProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques: Limassol, Cyprus, November 01-04, 2018
local.citation.startingPage1
local.citation.endingPage12


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple