Raising the level of abstraction : simulation of large chip multiprocessors running multithreaded applications

Rico Carro, Alejandro

doi:10.5821/dissertation-2117-95244

Visualitza/Obre

TARC1de1.pdf (1,817Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Rico Carro, Alejandro

Tutor / directorRamírez Bellido, Alejandro; Valero Cortés, Mateo

Càtedra / Departament / Institut

Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors

Tipus de documentTesi

Data de defensa2013-10-29

EditorUniversitat Politècnica de Catalunya

Condicions d'accésAccés obert

Attribution-NonCommercial-NoDerivs 3.0 Spain

Llevat que s'hi indiqui el contrari, els continguts d'aquesta obra estan subjectes a la llicència de Creative Commons : Reconeixement-NoComercial-SenseObraDerivada 3.0 Espanya

Abstract

The number of transistors on an integrated circuit keeps doubling every two years. This increasing number of transistors is used to integrate more processing cores on the same chip. However, due to power density and ILP diminishing returns, the single-thread performance of such processing cores does not double every two years, but doubles every three years and a half. Computer architecture research is mainly driven by simulation. In computer architecture simulators, the complexity of the simulated machine increases with the number of available transistors. The more transistors, the more cores, the more complex is the model. However, the performance of computer architecture simulators depends on the single-thread performance of the host machine and, as we mentioned before, this is not doubling every two years but every three years and a half. This increasing difference between the complexity of the simulated machine and simulation speed is what we call the simulation speed gap. Because of the simulation speed gap, computer architecture simulators are increasingly slow. The simulation of a reference benchmark may take several weeks or even months. Researchers are concious of this problem and have been proposing techniques to reduce simulation time. These techniques include the use of reduced application input sets, sampled simulation and parallelization. Another technique to reduce simulation time is raising the level of abstraction of the simulated model. In this thesis we advocate for this approach. First, we decide to use trace-driven simulation because it does not require to provide functional simulation, and thus, allows to raise the level of abstraction beyond the instruction-stream representation. However, trace-driven simulation has several limitations, the most important being the inability to reproduce the dynamic behavior of multithreaded applications. In this thesis we propose a simulation methodology that employs a trace-driven simulator together with a runtime sytem that allows the proper simulation of multithreaded applications by reproducing the timing-dependent dynamic behavior at simulation time. Having this methodology, we evaluate the use of multiple levels of abstraction to reduce simulation time, from a high-speed application-level simulation mode to a detailed instruction-level mode. We provide a comprehensive evaluation of the impact in accuracy and simulation speed of these abstraction levels and also show their applicability and usefulness depending on the target evaluations. We also compare these levels of abstraction with the existing ones in popular computer architecture simulators. Also, we validate the highest abstraction level against a real machine. One of the interesting levels of abstraction for the simulation of multi-cores is the memory mode. This simulation mode is able to model the performanceof a superscalar out-of-order core using memory-access traces. At this level of abstraction, previous works have used filtered traces that do not include L1 hits, and allow to simulate only L2 misses for single-core simulations. However, simulating multithreaded applications using filtered traces as in previous works has inherent inaccuracies. We propose a technique to reduce such inaccuracies and evaluate the speed-up, applicability, and usefulness of memory-level simulation. All in all, this thesis contributes to knowledge with techniques for the simulation of chip multiprocessors with hundreds of cores using traces. It states and evaluates the trade-offs of using varying degress of abstraction in terms of accuracy and simulation speed.

CitacióRico Carro, A. Raising the level of abstraction : simulation of large chip multiprocessors running multithreaded applications. Tesi doctoral, UPC, Departament d'Arquitectura de Computadors, 2013. DOI 10.5821/dissertation-2117-95244. Disponible a: <http://hdl.handle.net/2117/95244>

URIhttp://hdl.handle.net/2117/95244

DOI10.5821/dissertation-2117-95244

Dipòsit legalB 13720-2014

Col·leccions

Tesis - Departament d'Arquitectura de Computadors [361]
Tesis - Totes les tesis [5.461]

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
TARC1de1.pdf		1,817Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

Raising the level of abstraction : simulation of large chip multiprocessors running multithreaded applications

Visualitza/Obre

Explora