Scalability evaluation of a polymorphic register file: a CG case study
Document typeConference report
Rights accessRestricted access - publisher's policy
We evaluate the scalability of a Polymorphic Register File using the Conjugate Gradient method as a case study. We focus on a heterogeneous multi-processor architecture, taking into consideration critical parameters such as cache bandwidth and memory latency. We compare the performance of 256 Polymorphic Register File-augmented workers against a single Cell PowerPC Processor Unit (PPU). In such a scenario, simulation results suggest that for the Sparse Matrix Vector Multiplication kernel, absolute speedups of up to 200 times can be obtained. Moreover, when equal number of workers in the range 1-256 is employed, our design is between 1.7 and 4.2 times faster than a Cell PPU-based system. Furthermore, we study the memory latency and cache bandwidth impact on the sustainable speedups of the system considered. Our tests suggest that a 128 worker configuration requires the caches to deliver 1638.4 GB/sec in order to preserve 80% of its peak speedup.
CitationCiobanu, C. [et al.]. Scalability evaluation of a polymorphic register file: a CG case study. A: International Conference on Architecture of Computing Systems. "Architecture of Computing Systems: ARCS 2011: 24th International Conference: Como, Italy: February 24-25, 2011: proceedings". Como: Springer, 2011, p. 13-25.
|Scalability Eva ... r File A CG Case Study.pdf||Scalability Evaluation of a Polymorphic Register File A CG Case Study||455.8Kb||Restricted access|