Mapping stream programs onto heterogeneous multiprocessor systems
Document typeConference lecture
PublisherACM Press, NY
Rights accessRestricted access - publisher's policy
This paper presents a partitioning and allocation algorithm for an iterative stream compiler, targeting heterogeneous multiprocessors with constrained distributed memory and any communications topology. We introduce a novel definition of connectedness that enables the algorithm to model the capabilities of the compiler. The algorithm uses convexity and connectedness constraints to produce partitions that are easier to compile and require short pipelines. Software pipelining is an effective transformation, but it increases memory footprint and latency, and has a startup overhead. Our algorithm takes account of these downstream costs. We show results for the StreamIt 2.1.1 benchmarks for an SMP, 2 × 2 mesh, SMP plus accelerator, and IBM QS20 blade, which has two Cell processors. Our results show that the average performance is within 5% of the unrestricted optimum found using a brute force search, while seldom requiring software pipelining. The heuristic is robust, and fast enough to be inside the feedback loop of an iterative compiler.
CitationCarpenter, P.; Ramirez, A.; Ayguade, E. Mapping stream programs onto heterogeneous multiprocessor systems. A: International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. "CASES 2009: International Conference on Compilers, Architecture, and Synthesis for Embedded Systems". Grenoble: ACM Press, NY, 2009, p. 57-66.