ExaQUte: D4.2 Profiling report of the partner’s tools, complete with performance suggestions
Document typeExternal research report
Rights accessOpen Access
European Commission's projectExaQUte - EXAscale Quantification of Uncertainties for Technology and Science Simulation (EC-H2020-800898)
This deliverable focuses on the proling activities developed in the project with the partner's applications. To perform this proling activities, a couple of benchmarks were dened in collaboration with WP5. The rst benchmark is an embarrassingly parallel benchmark that performs a read and then multiple writes of the same object, with the objective of stressing the memory and storage systems and evaluate the overhead when these reads and writes are performed in parallel. A second benchmark is dened based on the Continuation Multi Level Monte Carlo (C-MLMC) algorithm. While this algorithm is normally executed using multiple levels, for the proling and performance analysis objectives, the execution of a single level was enough since the forthcoming levels have similar performance characteristics. Additionally, while the simulation tasks can be executed as parallel (multi-threaded tasks), in the benchmark, single threaded tasks were executed to increase the number of simulations to be scheduled and stress the scheduling engines. A set of experiments based on these two benchmarks have been executed in the MareNostrum 4 supercomputer and using PyCOMPSs as underlying programming model and dynamic scheduler of the tasks involved in the executions. While the rst benchmark was executed several times in a single iteration, the second benchmark was executed in an iterative manner, with cycles of 1) Execution and trace generation; 2) Performance analysis; 3) Improvements. This had enabled to perform several improvements in the benchmark and in the scheduler of PyCOMPSs. The initial iterations focused on the C-MLMC structure itself, performing re-factors of the code to remove ne grain and sequential tasks and merging them in larger granularity tasks. The next iterations focused on improving the PyCOMPSs scheduler, removing existent bottlenecks and increasing its performance by making the scheduler a multithreaded engine. While the results can still be improved, we are satised with the results since the granularity of the simulations run in this evaluation step are much ner than the one that will be used for the real scenarios. The deliverable nishes with some recommendations that should be followed along the project in order to obtain good performance in the execution of the project codes.
CitationAmela, R. [et al.]. ExaQUte: D4.2 Profiling report of the partner's tools, complete with performance suggestions. 2019. DOI 10.23967/exaqute.2021.2.023.
URL other repositoryhttps://www.scipedia.com/public/Table_Soriano_2019b