Mostra el registre d'ítem simple

dc.contributorLabarta Mancho, Jesús José
dc.contributor.authorGonzález García, Juan
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned2014-01-20T12:23:14Z
dc.date.available2014-01-20T12:23:14Z
dc.date.issued2013-06-07
dc.identifier.citationGonzález García, J. Application of clustering analysis and sequence analysis on the performance analysis of parallel applications. Tesi doctoral, UPC, Departament d'Arquitectura de Computadors, 2013. DOI 10.5821/dissertation-2117-95048.
dc.identifier.urihttp://hdl.handle.net/2117/95048
dc.description.abstractHigh Performance Computing and Supercomputing is the high end area of the computing science that studies and develops the most powerful computers available. Current supercomputers are extremely complex so are the applications that run on them. To take advantage of the huge amount of computing power available it is strictly necessary to maximize the knowledge we have about how these applications behave and perform. This is the mission of the (parallel) performance analysis. In general, performance analysis toolkits oUer a very simplistic manipulations of the performance data. First order statistics such as average or standard deviation are used to summarize the values of a given performance metric, hiding in some cases interesting facts available on the raw performance data. For this reason, we require the Performance Analytics, i.e. the application of Data Analytics techniques in the performance analysis area. This thesis contributes with two new techniques to the Performance Analytics Veld. First contribution is the application of the cluster analysis to detect the parallel application computation structure. Cluster analysis is the unsupervised classiVcation of patterns (observations, data items or feature vectors) into groups (clusters). In this thesis we use the cluster analysis to group the CPU burst of a parallel application, the regions on each process in-between communication calls or calls to the parallel runtime. The resulting clusters obtained are the diUerent computational trends or phases that appear in the application. These clusters are useful to understand the behaviour of computation part of the application and focus the analyses to those that present performance issues. We demonstrate that our approach requires diUerent clustering algorithms previously used in the area. Second contribution of the thesis is the application of multiple sequence alignment algorithms to evaluate the computation structure detected. Multiple sequence alignment (MSA) is technique commonly used in bioinformatics to determine the similarities across two or more biological sequences: DNA or roteins. The Cluster Sequence Score we introduce applies a Multiple Sequence Alignment (MSA) algorithm to evaluate the SPMDiness of an application, i.e. how well its computation structure represents the Single Program Multiple Data (SPMD) paradigm structure. We also use this score in the Aggregative Cluster Re-Vnement, a new clustering algorithm we designed, able to detect the SPMD phases of an application at Vne-grain, surpassing the cluster algorithms we used initially. We demonstrate the usefulness of these techniques with three practical uses. The Vrst one is an extrapolation methodology able to maximize the performance metrics that characterize the application phases detected using a single application execution. The second one is the use of the computation structure detected to speedup in a multi-level simulation infrastructure. Finally, we analyse four production-class applications using the computation characterization to study the impact of possible application improvements and portings of the applications to diUerent hardware conVgurations. In summary, this thesis proposes the use of cluster analysis and sequence analysis to automatically detect and characterize the diUerent computation trends of a parallel application. These techniques provide the developer / analyst an useful insight of the application performance and ease the understanding of the application’s behaviour. The contributions of the thesis are not reduced to proposals and publications of the techniques themselves, but also practical uses to demonstrate their usefulness in the analysis task. In addition, the research carried out during these years has provided a production tool for analysing applications’ structure, part of BSC Tools suite.
dc.format.extent251 p.
dc.language.isocat
dc.publisherUniversitat Politècnica de Catalunya
dc.rightsL'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by-nc/3.0/es/
dc.rights.urihttp://creativecommons.org/licenses/by-nc/3.0/es/
dc.sourceTDX (Tesis Doctorals en Xarxa)
dc.subjectÀrees temàtiques de la UPC::Informàtica
dc.titleApplication of clustering analysis and sequence analysis on the performance analysis of parallel applications
dc.typeDoctoral thesis
dc.subject.lemacSistemes virtuals (Informàtica)
dc.identifier.doi10.5821/dissertation-2117-95048
dc.identifier.dlB. 3865-2014
dc.rights.accessOpen Access
dc.description.versionPostprint (published version)
dc.identifier.tdxhttp://hdl.handle.net/10803/128875


Fitxers d'aquest items

Thumbnail

Aquest ítem apareix a les col·leccions següents

Mostra el registre d'ítem simple