MeRLiN: Exploiting dynamic instruction behavior for fast and accurate microarchitecture level reliability assessment
Document typeConference report
PublisherAssociation for Computing Machinery (ACM)
Rights accessOpen Access
Early reliability assessment of hardware structures using microarchitecture level simulators can effectively guide major error protection decisions in microprocessor design. Statistical fault injection on microarchitectural structures modeled in performance simulators is an accurate method to measure their Architectural Vulnerability Factor (AVF) but requires excessively long campaigns to obtain high statistical significance. We propose MeRLiN1, a methodology to boost microarchitecture level injection-based reliability assessment by several orders of magnitude and keep the accuracy of the assessment unaffected even for large injection campaigns with very high statistical significance. The core of MeRLiN is the grouping of faults of an initial list in equivalent classes. All faults in the same group target equivalent vulnerable intervals of program execution ending up to the same static instruction that reads the faulty entries. Faults in the same group occur in different times and entries of a structure and it is extremely likely that they all have the same effect in program execution; thus, fault injection is performed only on a few representatives from each group. We evaluate MeRLiN for different sizes of the physical register file, the store queue and the first level data cache of a contemporary microarchitecture running MiBench and SPEC CPU2006 benchmarks. For all our experiments, MeRLiN is from 2 to 3 orders of magnitude faster than an extremely high statistical significant injection campaign, reporting the same reliability measurements with negligible loss of accuracy. Finally, we theoretically analyze MeRLiN's statistical behavior to further justify its accuracy.
CitationKaliorakis, M., Gizopoulos, D., Canal, R., González, A. MeRLiN: Exploiting dynamic instruction behavior for fast and accurate microarchitecture level reliability assessment. A: International Symposium on Computer Architecture. "ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture". Toronto, ON: Association for Computing Machinery (ACM), 2017, p. 241-254.