Controlling parallel adaptive sparse grid collocation simulations with chiron
Cita com:
hdl:2117/333801
Document typeConference report
Defense date2015
PublisherCIMNE
Rights accessOpen Access
All rights reserved. This work is protected by the corresponding intellectual and industrial
property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public
communication or transformation of this work are prohibited without permission of the copyright holder
Abstract
Nonintrusive UQ methods, roughly describing, use a (expected small) number of runs of a deterministic computational model, each one having as inputs judiciously chosen points of the stochastic input space. Statistics of the outputs are then estimated from the deterministic computations, generating a large amount of data and thus requiring careful management [1]. Using Chiron, a data-centric scientific workflow engine that executes, in parallel, scientific applications, helps to control and manage these data. Chiron uses a dynamic data-centric approach, where scientific workflow algebra handles the parallel workflow execution efficiently. The algebra also standardizes data consumption and production as algebraic operands, with adherence to W3C provenance data model. Provenance is essential for scientific and engineering experiments and ensures that the experiment can be repeated over different conditions. Chiron provides native support for distributed provenance by storing provenance data during the execution of all samples and making it available for querying at runtime. Thus, it is possible to monitor the status of each input point run and availability of results through runtime provenance dataflow queries. Monitoring some specific attributes, results or checking the elapsed time of a given task may indicate that a failure happened. Such information can be used to refine the task (to prevent it from failing again) and resubmit it. Depending on the gathered results, the user may decide to change (or add) parameters corresponding to an input point or to make other decisions regarding the simulation. Finally, note that since each input corresponds to a parallel job assigned to a number of processors, which is in turn also solved in parallel, Chiron handles the execution of several simultaneous jobs, which configures a two-level overall parallel execution scheme. Uncertainty quantification scenarios with adaptive sparse grid collocation are particularly amenable to be steered and controlled by Chiron [2]. They require the ability of adapting the workflow, at runtime, based on user input and dynamic steering, according to error measures (or input thresholds) given by the user. We evaluate our approach using a novel and real large-scale workflow for uncertainty quantification on a 640-core cluster. The results show impressive execution time savings from 2.5 to 24 days, compared to non-iterative workflow execution.
CitationSouza, V.S. [et al.]. Controlling parallel adaptive sparse grid collocation simulations with chiron. A: ADMOS 2015. CIMNE, 2015, p. 58.
Files | Description | Size | Format | View |
---|---|---|---|---|
Admos2015-37-Co ... rallel Adaptive Sparse.pdf | 139,9Kb | View/Open |