Raising the level of abstraction : simulation of large chip multiprocessors running multithreaded applications

Rico Carro, Alejandro

doi:10.5821/dissertation-2117-95244

dc.contributor	Ramírez Bellido, Alejandro
dc.contributor	Valero Cortés, Mateo
dc.contributor.author	Rico Carro, Alejandro
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned	2014-05-20T12:46:29Z
dc.date.available	2014-05-20T12:46:29Z
dc.date.issued	2013-10-29
dc.identifier.citation	Rico Carro, A. Raising the level of abstraction : simulation of large chip multiprocessors running multithreaded applications. Tesi doctoral, UPC, Departament d'Arquitectura de Computadors, 2013. DOI 10.5821/dissertation-2117-95244.
dc.identifier.uri	http://hdl.handle.net/2117/95244
dc.description.abstract	The number of transistors on an integrated circuit keeps doubling every two years. This increasing number of transistors is used to integrate more processing cores on the same chip. However, due to power density and ILP diminishing returns, the single-thread performance of such processing cores does not double every two years, but doubles every three years and a half. Computer architecture research is mainly driven by simulation. In computer architecture simulators, the complexity of the simulated machine increases with the number of available transistors. The more transistors, the more cores, the more complex is the model. However, the performance of computer architecture simulators depends on the single-thread performance of the host machine and, as we mentioned before, this is not doubling every two years but every three years and a half. This increasing difference between the complexity of the simulated machine and simulation speed is what we call the simulation speed gap. Because of the simulation speed gap, computer architecture simulators are increasingly slow. The simulation of a reference benchmark may take several weeks or even months. Researchers are concious of this problem and have been proposing techniques to reduce simulation time. These techniques include the use of reduced application input sets, sampled simulation and parallelization. Another technique to reduce simulation time is raising the level of abstraction of the simulated model. In this thesis we advocate for this approach. First, we decide to use trace-driven simulation because it does not require to provide functional simulation, and thus, allows to raise the level of abstraction beyond the instruction-stream representation. However, trace-driven simulation has several limitations, the most important being the inability to reproduce the dynamic behavior of multithreaded applications. In this thesis we propose a simulation methodology that employs a trace-driven simulator together with a runtime sytem that allows the proper simulation of multithreaded applications by reproducing the timing-dependent dynamic behavior at simulation time. Having this methodology, we evaluate the use of multiple levels of abstraction to reduce simulation time, from a high-speed application-level simulation mode to a detailed instruction-level mode. We provide a comprehensive evaluation of the impact in accuracy and simulation speed of these abstraction levels and also show their applicability and usefulness depending on the target evaluations. We also compare these levels of abstraction with the existing ones in popular computer architecture simulators. Also, we validate the highest abstraction level against a real machine. One of the interesting levels of abstraction for the simulation of multi-cores is the memory mode. This simulation mode is able to model the performanceof a superscalar out-of-order core using memory-access traces. At this level of abstraction, previous works have used filtered traces that do not include L1 hits, and allow to simulate only L2 misses for single-core simulations. However, simulating multithreaded applications using filtered traces as in previous works has inherent inaccuracies. We propose a technique to reduce such inaccuracies and evaluate the speed-up, applicability, and usefulness of memory-level simulation. All in all, this thesis contributes to knowledge with techniques for the simulation of chip multiprocessors with hundreds of cores using traces. It states and evaluates the trade-offs of using varying degress of abstraction in terms of accuracy and simulation speed.
dc.format.extent	156 p.
dc.language.iso	eng
dc.publisher	Universitat Politècnica de Catalunya
dc.rights	L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.source	TDX (Tesis Doctorals en Xarxa)
dc.subject	Àrees temàtiques de la UPC::Informàtica
dc.title	Raising the level of abstraction : simulation of large chip multiprocessors running multithreaded applications
dc.type	Doctoral thesis
dc.subject.lemac	Transistors
dc.subject.lemac	Arquitectura d'ordinadors
dc.identifier.doi	10.5821/dissertation-2117-95244
dc.identifier.dl	B 13720-2014
dc.rights.access	Open Access
dc.description.version	Postprint (published version)
dc.identifier.tdx	http://hdl.handle.net/10803/134743

Fitxers d'aquest items

Nom:: TARC1de1.pdf
Mida:: 1,817Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Departament d'Arquitectura de Computadors [361]
Totes les tesis [5.461]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Raising the level of abstraction : simulation of large chip multiprocessors running multithreaded applications

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora