ExaQUte: D4.2 Profiling report of the partner’s tools, complete with performance suggestions

Amela Milian, Ramon; Badia Sala, Rosa Maria; Böhm, Stanislav; Tosi, Riccardo; Rossi, Riccardo

doi:10.23967/exaqute.2021.2.023

Visualitza/Obre

31779672.pdf (1,672Mb)

Veure estadístiques d'ús d'UPCommons

Estadístiques de LA Referencia / Recolecta

Cita com:

Mostra el registre d'ítem complet

Amela Milian, Ramon

Badia Sala, Rosa Maria

Böhm, Stanislav

Tosi, Riccardo

Rossi, Riccardo

Tipus de documentReport de recerca

Data publicació2019-05-30

Condicions d'accésAccés obert

Attribution-NonCommercial-ShareAlike 3.0 Spain

Llevat que s'hi indiqui el contrari, els continguts d'aquesta obra estan subjectes a la llicència de Creative Commons : Reconeixement-NoComercial-CompartirIgual 3.0 Espanya

ProjecteExaQUte - EXAscale Quantification of Uncertainties for Technology and Science Simulation (EC-H2020-800898)

Abstract

This deliverable focuses on the proling activities developed in the project with the partner's applications. To perform this proling activities, a couple of benchmarks were dened in collaboration with WP5. The rst benchmark is an embarrassingly parallel benchmark that performs a read and then multiple writes of the same object, with the objective of stressing the memory and storage systems and evaluate the overhead when these reads and writes are performed in parallel. A second benchmark is dened based on the Continuation Multi Level Monte Carlo (C-MLMC) algorithm. While this algorithm is normally executed using multiple levels, for the proling and performance analysis objectives, the execution of a single level was enough since the forthcoming levels have similar performance characteristics. Additionally, while the simulation tasks can be executed as parallel (multi-threaded tasks), in the benchmark, single threaded tasks were executed to increase the number of simulations to be scheduled and stress the scheduling engines. A set of experiments based on these two benchmarks have been executed in the MareNostrum 4 supercomputer and using PyCOMPSs as underlying programming model and dynamic scheduler of the tasks involved in the executions. While the rst benchmark was executed several times in a single iteration, the second benchmark was executed in an iterative manner, with cycles of 1) Execution and trace generation; 2) Performance analysis; 3) Improvements. This had enabled to perform several improvements in the benchmark and in the scheduler of PyCOMPSs. The initial iterations focused on the C-MLMC structure itself, performing re-factors of the code to remove ne grain and sequential tasks and merging them in larger granularity tasks. The next iterations focused on improving the PyCOMPSs scheduler, removing existent bottlenecks and increasing its performance by making the scheduler a multithreaded engine. While the results can still be improved, we are satised with the results since the granularity of the simulations run in this evaluation step are much ner than the one that will be used for the real scenarios. The deliverable nishes with some recommendations that should be followed along the project in order to obtain good performance in the execution of the project codes.

CitacióAmela, R. [et al.]. ExaQUte: D4.2 Profiling report of the partner's tools, complete with performance suggestions. 2019. DOI 10.23967/exaqute.2021.2.023.

URIhttp://hdl.handle.net/2117/346909

DOI10.23967/exaqute.2021.2.023

URL repositori externhttps://www.scipedia.com/public/Table_Soriano_2019b

Col·leccions

Veure estadístiques d'ús d'UPCommons

Mostra el registre d'ítem complet

Fitxers	Descripció	Mida	Format	Visualitza
31779672.pdf		1,672Mb	PDF	Visualitza/Obre

UPCommons. Portal del coneixement obert de la UPC

ExaQUte: D4.2 Profiling report of the partner’s tools, complete with performance suggestions

Visualitza/Obre

Explora