Self-tuned parallel runtimes: a case of study for OpenMP
ColaboratorCorbalán González, Julita; Ayguadé Parra, Eduard; Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
Document typeDoctoral thesis
PublisherUniversitat Politècnica de Catalunya
Rights accessOpen Access
In recent years parallel computing has become ubiquitous. Lead by the spread of commodity multicore processors, parallel programming is not anymore an obscure discipline only mastered by a few.Unfortunately, the amount of able parallel programmers has not increased at the same speed because is not easy to write parallel codes.Parallel programming is inherently different from sequential programming. Programmers must deal with a whole new set of problems: identification of parallelism, work and data distribution, load balancing, synchronization and communication.Parallel programmers have embraced several languages designed to allow the creation of parallel applications. In these languages, the programmer is not only responsible of identifying the parallelism but also of specifying low-level details of how the parallelism needs to exploited (e.g. scheduling, thread distribution ...). This is a burden than hampers the productivity of the programmers.We demonstrate that is possible for the runtime component of a parallel environment to adapt itself to the application and the execution environment and thus reducing the burden put into the programmer. For this purpose we study three different parameters that are involved in the parallel exploitation of the OpenMP parallel language: parallel loop scheduling, thread allocation in multiple levels of parallelism and task granularity control.In all the cases, we propose a self-tuned algorithm that will first perform an on-line profiling of the application and based on the information gathered it will adapt the value of the parameter to the one that maximizes the performance of the application.Our goal is not to develop methods that outperform a hand-tuned application for a specific scenario, as this is probably just as difficult as compiler code outperforming hand-tuned assembly code, but methods that get close to that performance with a minimum effort from the programmer. In other words, what we want to achieve with our self-tuned algorithms is to maximize the ratio performance over effort so the entry level to the parallelism is lower. The evaluation of our algorithms with different applications shows that we achieve that goal.
- Tesis - TDX-UPC