Towards the simulation and emulation of large-scale hardware designs
Document typeMaster thesis
Rights accessOpen Access
The heritage of Moore's law has converged in a heterogeneous processor with a many-core and different application- or domain-specific accelerators. Having also finished the benefits of Dennard scaling, we have ended up in chips with a large area that cannot be powered all at the same time but have space to improve the performance. As a result, there are no more big performance gains from technology, and the most promising solutions are the creation of very smart designs of existing modules or exploring new specialized architectures. It is already a reality to see commercial products with many accelerators integrated on the System-On-Chip (SoC). Therefore, future chips' perspective is to continue increasing the complexity and number of hardware modules added to the SoC. Consequently, the complexity to verify such systems has increased in the last decades and will increment in the near future. The latter has resulted in multiple proposals to speed-up the verification in both academia and industry. It also corresponds to the main focus of this thesis resulting in two different contributions. In the first contribution, we explore a solution to emulate a big Network-On-Chip (NoC) in an emulation platform such as an FPGA or a hardware emulator. Emulating a NoC of 16 cores is unfeasible even in a hardware emulation platform depending on cores' size, which is pretty big. For this reason, we have exchanged the cores by a trace-based packet injector that mimics the behavior of an Out-of-Order (OoO) core running a benchmark. This contribution has materialized in the design of the trace specification and implementation of the trace generator in a full-system simulator: gem5. In addition, a preliminary study with a simple NoC has been done in order to validate the traces, with successful results. In the second contribution, we have developed a tool to perform functional testing and early design exploration of Register-Transfer Level (RTL) models inside a full-system simulator: gem5. We enable early performance studies of RTL models in an environment that models an entire SoC able to boot Linux and run complex multi-threaded and multi-programmed workloads. The framework is open-source and unifies gem5 with a HDL simulator: Verilator. Finally, we have made an evaluation of two different cases: a functional debug of an in-house Performance Monitoring Unit (PMU); a design space exploration of the type of memory to use with a Machine Learning (ML) accelerator: NVIDIA Deep Learning Accelerator (NVDLA).
SubjectsSemiconductor storage devices, Networks on a chip, Ordinadors -- Memòries semiconductores, Xarxes en xip
DegreeMÀSTER UNIVERSITARI EN INNOVACIÓ I RECERCA EN INFORMÀTICA (Pla 2012)
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder