Data-flow driven optimal tasks distribution for global heterogeneous systems
Rights accessOpen Access
As a result of advances in technology and highly demanding users expectations, more and more applications require intensive computing resources and, most importantly, high consumption of data distributed throughout the environment. For this reason, there has been an increasing number of research efforts to cooperatively use geographically distributed resources, working in parallel and sharing resources and data. In fact, an application can be structured into a set of tasks organized through interdependent relationships, some of which can be effectively executed in parallel, notably speeding up the execution time. In this work a model is proposed aimed at offloading tasks execution in heterogeneous environments, considering different nodes computing capacity connected through distinct network bandwidths, and located at different distances. In the envisioned model, the focus is on the overhead produced when accessing remote data sources as well as the data transfer cost generated between tasks at run-time. The novelty of this approach is that the mechanism proposed for tasks allocation is data-flow aware, considering the geographical location of both, computing nodes and data sources, ending up in an optimal solution to a highly complex problem. Two optimization strategies are proposed, the Optimal Matching Model and the Staged Optimization Model, as two different approaches to obtain a solution to the task scheduling problem. In the optimal model approach a global solution for all application’s tasks is considered, finding an optimal solution. Differently, the staged model approach is designed to obtain a local optimal solution by stages. In both cases, a mixed integer linear programming model has been designed intended to minimizing the application execution time. In the studies carried out to evaluate this proposal, the staged model provides the optimal solution in 76% of the simulated scenarios, while it also dramatically reduces the solving time with respect to optimal. Both models have pros and cons and, in fact, can be used together to complement each other. The optimal model finds the global optimal solution at high running time cost, which makes this model unpractical on some scenarios. The staged model instead, is faster enough to be used on those scenarios; however, the given solution might not be optimal in some cases.
CitationGarcia, J. [et al.]. Data-flow driven optimal tasks distribution for global heterogeneous systems. "Future generation computer systems", Juliol 2021, vol. 125, p. 792-805.