Tracing methodologies and tools for artificial intelligence and data mining Java applications
Tutor / director / evaluatorGil Gomez, Marisa
CovenanteePolitecnico di Torino
Document typeMaster thesis
Rights accessOpen Access
Supercomputing and Artificial Intelligence are among the most important outcomes of the last decades. Both of them have been behind the scenes of many recent discoveries, and together with most of the applications in general, have been switching from a sequential paradigm to parallel and distributed approaches, that best fit the new hardware. The High Performance Computing (HPC) discipline is at the heart of these developments. In this context, the Java programming language plays a marginal role. However, Java is still in high demand, it is employed in AI and runs effectively on supercomputers. Even if a smaller set of programmers use it for HPC applications, its influence in the AI world is not negligible and it deserves a larger attention to the tools that support its development in such environment. Parallel program performance analysis is concerned with achieving efficient utilisation of system resources. One common technique is to collect trace data and then analyse it for possible causes of poor performance. A department of the BSC, the Performance Tools department, is in charge of developing this kind of tools. The thesis has been developed as an intern in this department, and for this reason the base of the work is going to be on the two main tools developed there: Extrae and Paraver. The former is the program needed to extract information, while the second one to show them. The main focus of this thesis is on Extrae. The state of the art of Extrae's instrumentation for Java is poorly implemented. Out of some basic features to trace basic thread events, using the instrumentation of pthreads (on which all Java threads are mapped), it does not give much valuable information. A study on the state of the art is covered in chapter 2. Since Extrae is implemented in C, generating probes and wrappers would not be an issue for other C-implemented programs. In chapter 3 there is an overview of the approaches that can be used to generate the traces for a Java program. The approach that is then developed is going to be based on an event-driven platform offered by the JVM (the JVM TI), united to the extension for the Java language that implement aspect-oriented programming paradigm (AspectJ). The development of this platform follows in chapter 4 and chapter 5, and will be applied on a real Java framework: Hadoop. This study is carried out in chapter 6, where also discussions on the whole work of the thesis can be found.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder