Mostra el registre d'ítem simple

dc.contributor.authorLedoux Pardo, Luis Eduardo
dc.contributor.authorCasas, Marc
dc.contributor.otherUniversitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors
dc.contributor.otherBarcelona Supercomputing Center
dc.date.accessioned2022-06-16T08:55:19Z
dc.date.available2022-06-16T08:55:19Z
dc.date.issued2022
dc.identifier.citationLedoux, L.; Casas, M. A generator of numerically-tailored and high-throughput accelerators for batched GEMMs. A: IEEE Symposium on Field Programmable Custom Computing Machines. "2022 IEEE 30th International Symposium on Field-Programmable Custom Computing Machines, FCCM 2022: 15-18 May, 2022, New York, NY, USA: proceedings". Institute of Electrical and Electronics Engineers (IEEE), 2022, ISBN 978-1-6654-8332-2. DOI 10.1109/FCCM53951.2022.9786164.
dc.identifier.isbn978-1-6654-8332-2
dc.identifier.urihttp://hdl.handle.net/2117/368563
dc.description.abstractWe propose a hardware generator of GEMM accelerators. Our generator produces vendor-agnostic HDL describing highly customizable systolic arrays guided by accuracy and energy efficiency goals. The generated arrays have three main novel aspects. First, the accelerators handle a large variety of computer number formats using intermediate representations based on our Sign Scale Significand (S3) format. Second, the processing elements perform all intermediate dot-product arithmetic operations required by the GEMM kernel without any intermediate rounding, which makes it possible to deliver better energy efficiency than state-of-the-art approaches while offering more accuracy and reproducible results. Third, our accelerators feature the Half-Speed Sink Down (HSSD) mechanism, which maximizes the overlap of host-accelerator data transfers with GEMM computations.We evaluate our automatically generated designs in a cutting-edge setup composed of a POWER9 host, CAPI (Coherent Accelerator Processor Interface) link, and a Virtex Ultrascale Plus FPGA. Arrays can operate at the speed of the link and saturate it to reach a 13GB/s throughput. Our fine-grain customization approach allows to cover a wide range of accuracy versus efficiency scenarios and can reach 0.65GOps/s/W while producing 1024 accurate bits or 148.7GOps/s/W with 6 accurate bits. Our configurations achieve up to 1613GOps/s system performance and power efficiencies of up to 240GOps/s/W for the FPGA. This automatic generator is the first being able to produce such a variety of designs. We improve the single-precision energy efficiency of state-of-the-art FPGA GEMM accelerators by 1.86×.
dc.description.sponsorshipThis work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 955606 Marc Casas is supported by Grant RYC-2017-23269 funded by MCIN/AEI/ 10.13039/501100011033 and by “ESF Investing in your future”
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.subjectÀrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject.lcshEnergy consumption
dc.subject.lcshField programmable gate arrays
dc.subject.otherSystem performance
dc.subject.otherThroughput
dc.subject.otherGenerators
dc.subject.otherHardware
dc.subject.otherEnergy efficiency
dc.subject.otherSystolic arrays
dc.subject.otherSpace exploration
dc.titleA generator of numerically-tailored and high-throughput accelerators for batched GEMMs
dc.typeConference report
dc.subject.lemacEnergia -- Consum
dc.subject.lemacMatrius de portes programables per l'usuari
dc.identifier.doi10.1109/FCCM53951.2022.9786164
dc.description.peerreviewedPeer Reviewed
dc.relation.publisherversionhttps://ieeexplore.ieee.org/document/9786164
dc.rights.accessOpen Access
local.identifier.drac33825737
dc.description.versionPostprint (author's final draft)
dc.relation.projectidinfo:eu-repo/grantAgreement/EC/H2020/955606/EU/DEEP – SOFTWARE FOR EXASCALE ARCHITECTURES/DEEP-SEA
local.citation.authorLedoux, L.; Casas, M.
local.citation.contributorIEEE Symposium on Field Programmable Custom Computing Machines
local.citation.publicationName2022 IEEE 30th International Symposium on Field-Programmable Custom Computing Machines, FCCM 2022: 15-18 May, 2022, New York, NY, USA: proceedings


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple