Exact and heuristic allocation of multi-kernel applications to multi-FPGA platforms
Document typeConference report
PublisherAssociation for Computing Machinery (ACM)
Rights accessOpen Access
FPGA-based accelerators demonstrated high energy efficiency compared to GPUs and CPUs. However, single FPGA designs may not achieve sufficient task parallelism. In this work, we optimize the mapping of high-performance multi-kernel applications, like Convolutional Neural Networks, to multi-FPGA platforms. First, we formulate the system level optimization problem, choosing within a huge design space the parallelism and number of compute units for each kernel in the pipeline. Then we solve it using a combination of Geometric Programming, producing the optimum performance solution given resource and DRAM bandwidth constraints, and a heuristic allocator of the compute units on the FPGA cluster.
CitationShan, J. [et al.]. Exact and heuristic allocation of multi-kernel applications to multi-FPGA platforms. A: Design Automation Conference. "DAC'19: proceedings of the 56th Annual Design Automation Conference 2019, Las Vegas, NV, USA, June 02-06, 2019". New York: Association for Computing Machinery (ACM), p. 1-6.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder