Computation reuse in DNNs by exploiting input similarity

Riera Villanueva, Marc; Arnau Montañés, José María; González Colás, Antonio María

doi:10.1109/ISCA.2018.00016

Visualitza/Obre

08416818.pdf (379,0Kb) (Accés restringit) Sol·licita una còpia a l'autor

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Riera Villanueva, Marc

Arnau Montañés, José María

González Colás, Antonio María

Tipus de documentText en actes de congrés

Data publicació2018

EditorInstitute of Electrical and Electronics Engineers (IEEE)

Condicions d'accésAccés restringit per política de l'editorial

Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets

Abstract

In recent years, Deep Neural Networks (DNNs) have achieved tremendous success for diverse problems such as classification and decision making. Efficient support for DNNs on CPUs, GPUs and accelerators has become a prolific area of research, resulting in a plethora of techniques for energy-efficient DNN inference. However, previous proposals focus on a single execution of a DNN. Popular applications, such as speech recognition or video classification, require multiple back-to-back executions of a DNN to process a sequence of inputs (e.g., audio frames, images). In this paper, we show that consecutive inputs exhibit a high degree of similarity, causing the inputs/outputs of the different layers to be extremely similar for successive frames of speech or images of a video. Based on this observation, we propose a technique to reuse some results of the previous execution, instead of computing the entire DNN. Computations related to inputs with negligible changes can be avoided with minor impact on accuracy, saving a large percentage of computations and memory accesses. We propose an implementation of our reuse-based inference scheme on top of a state-of-the-art DNN accelerator. Results show that, on average, more than 60% of the inputs of any neural network layer tested exhibit negligible changes with respect to the previous execution. Avoiding the memory accesses and computations for these inputs results in 63% energy savings on average.

CitacióRiera, M., Arnau, J., Gonzalez, A. Computation reuse in DNNs by exploiting input similarity. A: International Symposium on Computer Architecture. "2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA 2018): Los Angeles, California, USA: 1-6 June 2018". Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 57-68.

URIhttp://hdl.handle.net/2117/125204

DOI10.1109/ISCA.2018.00016

ISBN9781538659854

Versió de l'editorhttps://ieeexplore.ieee.org/document/8416818

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
08416818.pdf		379,0Kb	PDF	Accés restringit

UPCommons. Portal del coneixement obert de la UPC

Computation reuse in DNNs by exploiting input similarity

Visualitza/Obre

Explora