Improving the effective use of multithreaded architectures : implications on compilation, thread assignment, and timing analysis
ColaboratorVerdú Mulà, Javier; Pajuelo González, Manuel A. (Manuel Alejandro); Nemirovsky, Mario; Cazorla Almeida, Francisco Javier; Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
Document typeDoctoral thesis
PublisherUniversitat Politècnica de Catalunya
Rights accessOpen Access
This thesis presents cross-domain approaches that improve the effective use of multithreaded architectures. The contributions of the thesis can be classified in three groups. First, we propose several methods for thread assignment of network applications running in multithreaded network servers. Second, we analyze the problem of graph partitioning that is a part of the compilation process of multithreaded streaming applications. Finally, we present a method that improves the measurement-based timing analysis of multithreaded architectures used in time-critical environments. The following sections summarize each of the contributions. (1) Thread assignment on multithreaded processors: State-of-the-art multithreaded processors have different level of resource sharing (e.g. between thread running on the same core and globally shared resources). Thus, the way that threads of a given workload are assigned to processors' hardware contexts determines which resources the threads share, which, in turn, may significantly affect the system performance. In this thesis, we demonstrate the importance of thread assignment for network applications running in multithreaded servers. We also present TSBSched and BlackBox scheduler, methods for thread assignment of multithreaded network applications running on processors with several levels of resource sharing. Finally, we propose a statistical approach to the thread assignment problem. In particular, we show that running a sample of several hundred or several thousand random thread assignments is sufficient to capture at least one out of 1% of the best-performing assignments with a very high probability. We also describe the method that estimates the optimal system performance for given workload. We successfull y applied TSBSched, BlackBox scheduler, and the presented statistical approach to a case study of thread assignment of multithreaded network applications running on the UltraSPARC T2 processor. (2) Kernel partitioning of streaming applications: An important step in compiling a stream program to multiple processors is kernel partitioning. Finding an optimal kernel partition is, however, an intractable problem. We propose a statistical approach to the kernel partitioning problem. We describe a method that statistically estimates the performance of the optimal kernel partition. We demonstrate that the sampling method is an important part of the analysis, and that not all methods that generate random samples provide good results. We also show that random sampling on its own can be used to find a good kernel partition, and that it could be an alternative to heuristics-based approaches. The presented statistical method is applied successfully to the benchmarks included in the StreamIt 2.1.1 suite. (3) Multithreaded processors in time-critical environments: Despite the benefits that multithreaded commercial-of-the-shelf (MT COTS) processors may offer in embedded real-time systems, the time-critical market has not yet embraced a shift toward these architectures. The main challenge with MT COTS architectures is the difficulty when predicting the execution time of concurrently-running (co-running) time-critical tasks. Providing a timing analysis for real industrial applications running on MT COTS processors becomes extremely difficult because the execution time of a task, and hence its worst-case execution time (WCET) depends on the interference with co-running tasks in shared processor resources. We show that the measurement-based timing analysis used for single-threaded processors cannot be directly extended for MT COTS architectures. Also, we propose a methodology that quantifies the slowdown that a task may experience because of collision with co-running tasks in shared resources of MT COTS processor. The methodology is applied to a case study in which different time-critical applications were executed on several MT COTS multithreaded processors.
- Tesis - TDX-UPC