Wavefront parallelization of recurrent neural networks on multi-core architectures
Document typeConference report
PublisherAssociation for Computing Machinery (ACM)
Rights accessOpen Access
European Commission's projectMont-Blanc 2020 - Mont-Blanc 2020, European scalable, modular and power efficient HPC processor (EC-H2020-779877)
Recurrent neural networks (RNNs) are widely used for natural language processing, time-series prediction, or text analysis tasks. The internal structure of RNNs inference and training in terms of data or control dependencies across their fundamental numerical kernels complicate the exploitation of model parallelism, which is the reason why just data-parallelism has been traditionally applied to accelerate RNNs. This paper presents W-Par (Wavefront-Parallelization), a comprehensive approach for RNNs inference and training on CPUs that relies on applying model parallelism into RNNs models. We use fine-grained pipeline parallelism in terms of wavefront computations to accelerate multi-layer RNNs running on multi-core CPUs. Wavefront computations have been widely applied in many scientific computing domains like stencil kernels or dynamic programming. W-Par divides RNNs workloads across different parallel tasks by defining input and output dependencies for each RNN cell. Our experiments considering different RNNs models demonstrate that W-Par achieves up to 6.6X speed-up for RNN models inference and training in comparison to current state-of-the-art implementations on modern multi-core CPU architectures. Importantly, W-Par maximizes performance on a wide range of scenarios, including different core counts or memory hierarchy configurations, without requiring any change at the source code level.
CitationSharma, R.; Casas, M. Wavefront parallelization of recurrent neural networks on multi-core architectures. A: International Conference on Supercomputing. "Proceedings of the 34th ACM International Conference on Supercomputing (ICS-2020): Barcelona, June 29–July 2, 2020". New York: Association for Computing Machinery (ACM), 2020, article 5, p. 1-12.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder