A Tensor Marshaling Unit for sparse tensor algebra on general-purpose processors
Fitxers
Títol de la revista
ISSN de la revista
Títol del volum
Col·laborador
Editor
Tribunal avaluador
Realitzat a/amb
Tipus de document
Data publicació
Editor
Condicions d'accés
item.page.rightslicense
Publicacions relacionades
Datasets relacionats
Projecte CCD
Abstract
This paper proposes the Tensor Marshaling Unit (TMU), a near-core programmable dataflow engine for multicore architectures that accelerates tensor traversals and merging, the most critical op-erations of sparse tensor workloads running on today’s computing infrastructures. The TMU leverages a novel multi-lane design that enables parallel tensor loading and merging, which naturally pro-duces vector operands that are marshaled into the core for efficient SIMD computation. The TMU supports all the necessary primitives to be tensor-format and tensor-algebra complete. We evaluate the TMU on a simulated multicore system using a broad set of ten-sor algebra workloads, achieving 3.6×, 2.8×, and 4.9× speedups over memory-intensive, compute-intensive, and merge-intensive vectorized software implementations, respectively.

