Adaptive MapReduce scheduling in shared environments
Document typeConference report
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Rights accessOpen Access
In this paper we present a MapReduce task scheduler for shared environments in which MapReduce is executed along with other resource-consuming workloads, such as transactional applications. All workloads may potentially share the same data store, some of them consuming data for analytics purposes while others acting as data generators. This kind of scenario is becoming increasingly important in data centers where improved resource utilization can be achieved through workload consolidation, and is specially challenging due to the interaction between workloads of different nature that compete for limited resources. The proposed scheduler aims to improve resource utilization across machines while observing completion time goals. Unlike other MapReduce schedulers, our approach also takes into account the resource demands for non-MapReduce workloads, and assumes that the amount of resources made available to the MapReduce applications is variable over time. As shown in our experiments, our proposal improves the management of MapReduce jobs in the presence of variable resource availability, increasing the accuracy of the estimations made by the scheduler, thus improving completion time goals without an impact on the fairness of the scheduler.
CitationPolo, J. [et al.]. Adaptive MapReduce scheduling in shared environments. A: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. "2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2014): Chicago, Illinois: USA, 26-29 May 2014". Chicago, IL: Institute of Electrical and Electronics Engineers (IEEE), 2014, p. 61-70.