Programming, debugging, profiling and optimizing transactional memory programs
ColaboratorOsman, Unsal; Valero Cortés, Mateo; Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
Document typeDoctoral thesis
PublisherUniversitat Politècnica de Catalunya
Rights accessOpen Access
Transactional memory (TM) is a new optimistic synchronization technique which has the potential of making shared memory parallel programming easier compared to locks without giving up from the performance. This thesis explores four aspects in the research of transactional memory. First, it studies how programming with TM compares to locks. During the course of work, it develops the first real transactional application ¿ AtomicQuake. AtomicQuake is adapted from the parallel version of the Quake game server by replacing all lock-based synchronization with atomic blocks. Findings suggest that programming with TM is indeed easier than locks. However the performance of current software TM systems falls behind the efficiently implemented lock-based versions of the same program. Also, the same findings report that the proposed language level extensions are not sufficient for developing robust production level software and that the existing development tools such as compilers, debuggers, and profilers lack support for developing transactional application. Second, this thesis introduces new set of debugging principles and abstractions. These new debugging principles and abstractions enable debugging synchronization errors which manifest at coarse atomic block level, wrong code inside atomic blocks, and also performance errors related to the implementation of the atomic block. The new debugging principles distinguish between debugging at the language level constructs such as atomic blocks and debugging the atomic blocks based on how they are implemented whether TM or lock inference. These ideas are demonstrated by implementing a debugger extension for WinDbg and the ahead-of-time C# to X86 Bartok-STM compiler. Third, this thesis investigates the type of performance bottlenecks in TM applications and introduces new profiling techniques to find and understand these bottlenecks. The new profiling techniques provide in-depth and comprehensive information about the wasted work caused by aborting transactions. The individual profiling abstractions can be grouped in three groups: (i) techniques to identify multiple conflicts from a single program run, (ii) techniques to describe the data structures involved in conflicts by using a symbolic path through the heap, rather than a machine address, and (iii) visualization techniques to summarize which transactions conflict most. The ideas were demonstrated by building a lightweight profiling framework for Bartok-STM and an offline tool which process and display the profiling data. Forth, this thesis explores and introduces new TM specific optimizations which target the wasted work due to aborting transactions. Using the results obtained with the profiling tool it analyzes and optimizes several applications from the STAMP benchmark suite. The profiling techniques effectively revealed TM-specific bottlenecks such as false conflicts and contentions accesses to data structures. The discovered bottlenecks were subsequently eliminated with using the new optimization techniques. Among the optimization highlights are the transaction checkpoints which reduced the wasted work in Intruder with 40%, decomposing objects to eliminate false conflicts in Bayes, early release in Labyrinth which decreased wasted work from 98% to 1%, using less contentions data structures such as chained hashtable in Intruder and Genome which have higher degree of parallelism.
- Tesis - TDX-UPC