Variability-aware architectures based on hardware redundancy for nanoscale reliable computation
ColaboratorRubio Sola, Jose Antonio; Universitat Politècnica de Catalunya. Departament d'Enginyeria Electrònica
Document typeDoctoral thesis
PublisherUniversitat Politècnica de Catalunya
Rights accessOpen Access
During the last decades, human beings have experienced a significant enhancement in the quality of life thanks in large part to the fast evolution of Integrated Circuits (IC). This unprecedented technological race, along with its significant economic impact, has been grounded on the production of complex processing systems from highly reliable compounding devices. However, the fundamental assumption of nearly ideal devices, which has been true within the past CMOS technology generations, today seems to be coming to an end. In fact, as MOSFET technology scales into nanoscale regime it approaches to fundamental physical limits and starts experiencing higher levels of variability, performance degradation, and higher rates of manufacturing defects. On the other hand, ICs with increasing number of transistors require a decrease in the failure rate per device in order to maintain the overall chip reliability. As a result, it is becoming increasingly important today the development of circuit architectures capable of providing reliable computation while tolerating high levels of variability and defect rates. The main objective of this thesis is to analyze and propose new fault-tolerant architectures based on redundancy for future technologies. Our research is founded on the principles of redundancy established by von Neumann in the 1950s and extends them to three new dimensions: 1. Heterogeneity: Most of the works on fault-tolerant architectures based on redundancy assume homogeneous variability in the replicas like von Neumann's original work. Instead, we explore the possibilities of redundancy when heterogeneity between replicas is taken into account. In this sense, we propose compensating mechanisms that select the weighting of the redundant information to maximize the overall reliability. 2. Asynchrony: Each of the replicas of a redundant system may have associated different processing delays due to variability and degradation; especially in future nanotechnologies. If we design our system to work locally in asynchronous mode then we may consider different voting policies to deal with the redundant information. Depending on how many replicas we collect before taking a decision we can obtain different trade-off between processing delay and reliability. We propose a mechanism for providing these facilities and analyze and simulate its operation. 3. Hierarchy: Finally, we explore the possibilities of redundancy applied at different hierarchy layers of complex processing systems. We propose to distribute redundancy across the various hierarchy layers and analyze the benefits that can be obtained. Drawing on the scenario of future ICs technologies, we push the concept of redundancy to its fullest expression through the study of realistic nano-device architectures. Most of the redundant architectures considered so far do not face properly the era of Terascale Computing and the nanotechnology trends. Since von Neumann applied for the first time redundancy at electronic circuits, never until now effects as common in nanoelectronics as degradation and interconnection failures have been treated directly from the standpoint of redundancy. In this thesis we address in a comprehensive manner the reliability of digital processing systems in the upcoming technology generations.
- Tesis - TDX-UPC