CGPA: Coarse-Grained Pruning of Activations for Energy-Efficient RNN Inference
Rights accessOpen Access
European Commission's projectCoCoUnit - CoCoUnit: An Energy-Efficient Processing Unit for Cognitive Computing (EC-H2020-833057)
Recurrent neural networks (RNNs) perform element-wise multiplications across the activations of gates. We show that a significant percentage of activations are saturated and propose coarse-grained pruning of activations (CGPA) to avoid the computation of entire neurons, based on the activation values of the gates. We show that CGPA can be easily implemented on top of a TPU-like architecture with negligible area overhead, resulting in 12% speedup and 12% energy savings on average for a set of widely used RNNs.
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
CitationRiera, M.; Arnau, J.; Gonzalez, A. CGPA: Coarse-Grained Pruning of Activations for Energy-Efficient RNN Inference. "IEEE micro", 1 Setembre 2019, vol. 39, núm. 5, p. 36-45.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder