UPCommons està en procés de migració del dia 10 fins al 14 Juliol. L’autentificació està deshabilitada per evitar canvis durant aquesta migració.
Accelerating SpMV on FPGAs through block-row compress: a task-based approach

View/Open
Cita com:
hdl:2117/399131
Document typeConference lecture
Defense date2023
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
ProjectMEEP - The MareNostrum Experimental Exascale Platform (EC-H2020-946002)
BSC - COMPUTACION DE ALTAS PRESTACIONES VIII (AEI-PID2019-107255GB-C21)
BSC - COMPUTACION DE ALTAS PRESTACIONES VIII (AEI-PID2019-107255GB-C21)
Abstract
Sparse Matrix-Vector multiplication (SpMV), computing y=α⋅A×x+β⋅y where y,x are dense vectors, α,β two scalar constants, and A is a sparse matrix, is a key kernel in many HPC applications. It exhibits a kind of memory access that is extremely hard to perform efficiently, due to its random access. In this paper, we present a new approach to accelerate SpMV on FPGAs. As FPGAs lack a default memory hierarchy, they can adapt to specific applications better. Also, an increasing number of FPGAs include High Bandwidth Memory (HBM), making the SpMV problem especially appealing to tackle on these kind of devices. We define a new sparse matrix encoding format (b8c) and its corresponding SpMV implementation using OmpSs@FPGA and HLS. This format allows us to leverage many of the FPGA strengths for intensive data processing, such as data streaming, customizable datapaths widths, parallel memory access for off-chip memory in the case of multiple memory channels (like in HBM), parallel memory access for on-chip memory and pipelining. We tested our proposal for both DDR and HBM memories to show the adaptability and scalability of our design. The presented b8c SpMV implementation is able to achieve higher performance than the state-of-the-art FPGA implementation of SpMV over all the matrices in the data set, achieving 3.52x performance on average with a minimum of 1.82x and a maximum of 6.28x even when running at 75% the frequency.
CitationOliver, J. [et al.]. Accelerating SpMV on FPGAs through block-row compress: a task-based approach. A: International Conference on Field-Programmable Logic and Applications. "2023 33rd International Conference on Field-Programmable Logic and Applications, FPL 2023: 4-8 September 2023, Gothenburg, Sweden: proceedings". Institute of Electrical and Electronics Engineers (IEEE), 2023, p. 151-158. ISBN 979-8-3503-4151-5. DOI 10.1109/FPL60245.2023.00029.
ISBN979-8-3503-4151-5
Publisher versionhttps://ieeexplore.ieee.org/document/10296357
Collections
- Doctorat en Arquitectura de Computadors - Ponències/Comunicacions de congressos [352]
- Computer Sciences - Ponències/Comunicacions de congressos [624]
- Departament d'Arquitectura de Computadors - Ponències/Comunicacions de congressos [2.055]
- PM - Programming Models - Ponències/Comunicacions de congressos [25]
Files | Description | Size | Format | View |
---|---|---|---|---|
2023146287.pdf | 537,7Kb | View/Open |