Cross-layer system reliability assessment framework for hardware faults
Document typeConference report
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessOpen Access
European Commisision's projectCLERECO - Cross-Layer Early Reliability Evaluation for the Computing cOntinuum (EC-FP7-611404)
System reliability estimation during early design phases facilitates informed decisions for the integration of effective protection mechanisms against different classes of hardware faults. When not all system abstraction layers (technology, circuit, microarchitecture, software) are factored in such an estimation model, the delivered reliability reports must be excessively pessimistic and thus lead to unacceptably expensive, over-designed systems. We propose a scalable, cross-layer methodology and supporting suite of tools for accurate but fast estimations of computing systems reliability. The backbone of the methodology is a component-based Bayesian model, which effectively calculates system reliability based on the masking probabilities of individual hardware and software components considering their complex interactions. Our detailed experimental evaluation for different technologies, microarchitectures, and benchmarks demonstrates that the proposed model delivers very accurate reliability estimations (FIT rates) compared to statistically significant but slow fault injection campaigns at the microarchitecture level.
CitationVallero, A., Savino, A., Politano, G., Stefano Di Carlo, Chatzidimitriou, A., Tselonis, S., Kaliorakis, M., Gizipoulos, D., Riera, M., Canal, R., González, A., Kooli, M., Bosio, A., Di Natale, G. Cross-layer system reliability assessment framework for hardware faults. A: IEEE International Test Conference. "2016 IEEE International Test Conference (ITC): proceedings". Fort Worth, TX: Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 1-10.