Mostra el registre d'ítem simple
A generator of numerically-tailored and high-throughput accelerators for batched GEMMs
dc.contributor.author | Ledoux Pardo, Luis Eduardo |
dc.contributor.author | Casas, Marc |
dc.contributor.other | Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors |
dc.contributor.other | Barcelona Supercomputing Center |
dc.date.accessioned | 2022-06-16T08:55:19Z |
dc.date.available | 2022-06-16T08:55:19Z |
dc.date.issued | 2022 |
dc.identifier.citation | Ledoux, L.; Casas, M. A generator of numerically-tailored and high-throughput accelerators for batched GEMMs. A: IEEE Symposium on Field Programmable Custom Computing Machines. "2022 IEEE 30th International Symposium on Field-Programmable Custom Computing Machines, FCCM 2022: 15-18 May, 2022, New York, NY, USA: proceedings". Institute of Electrical and Electronics Engineers (IEEE), 2022, ISBN 978-1-6654-8332-2. DOI 10.1109/FCCM53951.2022.9786164. |
dc.identifier.isbn | 978-1-6654-8332-2 |
dc.identifier.uri | http://hdl.handle.net/2117/368563 |
dc.description.abstract | We propose a hardware generator of GEMM accelerators. Our generator produces vendor-agnostic HDL describing highly customizable systolic arrays guided by accuracy and energy efficiency goals. The generated arrays have three main novel aspects. First, the accelerators handle a large variety of computer number formats using intermediate representations based on our Sign Scale Significand (S3) format. Second, the processing elements perform all intermediate dot-product arithmetic operations required by the GEMM kernel without any intermediate rounding, which makes it possible to deliver better energy efficiency than state-of-the-art approaches while offering more accuracy and reproducible results. Third, our accelerators feature the Half-Speed Sink Down (HSSD) mechanism, which maximizes the overlap of host-accelerator data transfers with GEMM computations.We evaluate our automatically generated designs in a cutting-edge setup composed of a POWER9 host, CAPI (Coherent Accelerator Processor Interface) link, and a Virtex Ultrascale Plus FPGA. Arrays can operate at the speed of the link and saturate it to reach a 13GB/s throughput. Our fine-grain customization approach allows to cover a wide range of accuracy versus efficiency scenarios and can reach 0.65GOps/s/W while producing 1024 accurate bits or 148.7GOps/s/W with 6 accurate bits. Our configurations achieve up to 1613GOps/s system performance and power efficiencies of up to 240GOps/s/W for the FPGA. This automatic generator is the first being able to produce such a variety of designs. We improve the single-precision energy efficiency of state-of-the-art FPGA GEMM accelerators by 1.86×. |
dc.description.sponsorship | This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 955606 Marc Casas is supported by Grant RYC-2017-23269 funded by MCIN/AEI/ 10.13039/501100011033 and by “ESF Investing in your future” |
dc.language.iso | eng |
dc.publisher | Institute of Electrical and Electronics Engineers (IEEE) |
dc.subject | Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors |
dc.subject.lcsh | Energy consumption |
dc.subject.lcsh | Field programmable gate arrays |
dc.subject.other | System performance |
dc.subject.other | Throughput |
dc.subject.other | Generators |
dc.subject.other | Hardware |
dc.subject.other | Energy efficiency |
dc.subject.other | Systolic arrays |
dc.subject.other | Space exploration |
dc.title | A generator of numerically-tailored and high-throughput accelerators for batched GEMMs |
dc.type | Conference report |
dc.subject.lemac | Energia -- Consum |
dc.subject.lemac | Matrius de portes programables per l'usuari |
dc.identifier.doi | 10.1109/FCCM53951.2022.9786164 |
dc.description.peerreviewed | Peer Reviewed |
dc.relation.publisherversion | https://ieeexplore.ieee.org/document/9786164 |
dc.rights.access | Open Access |
local.identifier.drac | 33825737 |
dc.description.version | Postprint (author's final draft) |
dc.relation.projectid | info:eu-repo/grantAgreement/EC/H2020/955606/EU/DEEP – SOFTWARE FOR EXASCALE ARCHITECTURES/DEEP-SEA |
local.citation.author | Ledoux, L.; Casas, M. |
local.citation.contributor | IEEE Symposium on Field Programmable Custom Computing Machines |
local.citation.publicationName | 2022 IEEE 30th International Symposium on Field-Programmable Custom Computing Machines, FCCM 2022: 15-18 May, 2022, New York, NY, USA: proceedings |