GPU-accelerated sparse matrix-vector product for a hybridizable discontinuous Galerkin method
Document typeConference report
Rights accessRestricted access - publisher's policy
The iterative solution of the large systems of equations that result from discontinuous Galerkin (DG) discretizations require the ability to carry out fast matrix-vector products. DG matrices have a sparse block structure with a constant number of non-zero equal-sized non-overlapping blocks per row. General-purpose sparse matrix-vector product algorithms are not designed to exploit the speci c structure of the DG matrices and, as a consequence, result in sub-optimal performance. To address this issue, we propose a sparse matrix-vector product for DG discretizations based on a dense tensor contraction. A GPU implementation of the proposed algorithm for a hybridizable discontinuous Galerkin (HDG) method is tested on the NVIDIA GEFORCE GTX 285. The results show that the tensor contraction performs at about 20 to 25 GFLOP/s in double precision with a sustained efficiency of more than 40% (60 GBytes/s) of the peak memory bandwidth (160 GBytes/s). Moreover, for HDG matrices in double precision, the proposed method is 2 times faster than the general sparse matrix-vector products provided by the GPU library CUSPARSE and about 30 times faster than MATLAB running on a CPU.
CitationRoca, X.; Nguyeny, N.; Peraire, J. GPU-accelerated sparse matrix-vector product for a hybridizable discontinuous Galerkin method. A: AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition. "49th AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition". Orlando, Florida: 2011, p. 1-12.
|RocaNguyenPeraireAIAA2011.pdf||Article principal||1019,Kb||Restricted access|