On the resilience of deep learning for reduced-voltage FPGAs
Visualitza/Obre
10.1109/PDP50117.2020.00023
Inclou dades d'ús des de 2022
Cita com:
hdl:2117/188799
Tipus de documentText en actes de congrés
Data publicació2020
EditorInstitute of Electrical and Electronics Engineers (IEEE)
Condicions d'accésAccés obert
Tots els drets reservats. Aquesta obra està protegida pels drets de propietat intel·lectual i
industrial corresponents. Sense perjudici de les exempcions legals existents, queda prohibida la seva
reproducció, distribució, comunicació pública o transformació sense l'autorització del titular dels drets
Abstract
Deep Neural Networks (DNNs) are inherently computation-intensive and also power-hungry. Hardware accelerators such as Field Programmable Gate Arrays (FPGAs) are a promising solution that can satisfy these requirements for both embedded and High-Performance Computing (HPC) systems. In FPGAs, as well as CPUs and GPUs, aggressive voltage scaling below the nominal level is an effective technique for power dissipation minimization. Unfortunately, bit-flip faults start to appear as the voltage is scaled down closer to the transistor threshold due to timing issues, thus creating a resilience issue.This paper experimentally evaluates the resilience of the training phase of DNNs in the presence of voltage underscaling related faults of FPGAs, especially in on-chip memories. Toward this goal, we have experimentally evaluated the resilience of LeNet-5 and also a specially designed network for CIFAR-10 dataset with different activation functions of Rectified Linear Unit (Relu) and Hyperbolic Tangent (Tanh). We have found that modern FPGAs are robust enough in extremely low-voltage levels and that low-voltage related faults can be automatically masked within the training iterations, so there is no need for costly software-or hardware-oriented fault mitigation techniques like ECC. Approximately 10% more training iterations are needed to fill the gap in the accuracy. This observation is the result of the relatively low rate of undervolting faults, i.e., <0.1%, measured on real FPGA fabrics. We have also increased the fault rate significantly for the LeNet-5 network by randomly generated fault injection campaigns and observed that the training accuracy starts to degrade. When the fault rate increases, the network with Tanh activation function outperforms the one with Relu in terms of accuracy, e.g., when the fault rate is 30% the accuracy difference is 4.92%.
CitacióGivaki, K. [et al.]. On the resilience of deep learning for reduced-voltage FPGAs. A: Euromicro International Conference on Parallel, Distributed, and Network-Based Processing. "2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2020: Västerås, Sweden, 11-13 March 2020: proceedings". Institute of Electrical and Electronics Engineers (IEEE), 2020, p. 110-117.
ISBN978-1-7281-6582-0
Versió de l'editorhttps://ieeexplore.ieee.org/document/9092423
Col·leccions
- Doctorat en Arquitectura de Computadors - Ponències/Comunicacions de congressos [292]
- Computer Sciences - Ponències/Comunicacions de congressos [574]
- CAP - Grup de Computació d'Altes Prestacions - Ponències/Comunicacions de congressos [784]
- Departament d'Arquitectura de Computadors - Ponències/Comunicacions de congressos [1.954]
Fitxers | Descripció | Mida | Format | Visualitza |
---|---|---|---|---|
Givaki et al.pdf | 1,750Mb | Visualitza/Obre |