Avoiding core's DUE & SDC via acoustic wave detectors and tailored error containment and recovery
Document typeConference report
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessOpen Access
The trend of downsizing transistors and operating voltage scaling has made the processor chip more sensitive against radiation phenomena making soft errors an important challenge. New reliability techniques for handling soft errors in the logic and memories that allow meeting the desired failures-in-time (FIT) target are key to keep harnessing the benefits of Moore's law. The failure to scale the soft error rate caused by particle strikes, may soon limit the total number of cores that one may have running at the same time. This paper proposes a light-weight and scalable architecture to eliminate silent data corruption errors (SDC) and detected unrecoverable errors (DUE) of a core. The architecture uses acoustic wave detectors for error detection. We propose to recover by confining the errors in the cache hierarchy, allowing us to deal with the relatively long detection latencies. Our results show that the proposed mechanism protects the whole core (logic, latches and memory arrays) incurring performance overhead as low as 0.60%. © 2014 IEEE.
CitationUpasani, G.; Vera, X.; González, A. Avoiding core's DUE & SDC via acoustic wave detectors and tailored error containment and recovery. A: International Symposium on Computer Architecture. "ISCA 2014: the 41st Annual International Symposium on Computer Architecture: June 14-18, 2014: Minneapolis, MN, USA". Minneapolis: Institute of Electrical and Electronics Engineers (IEEE), 2014, p. 37-48.