Automatic Partitioning of Parallel Loops for Cache-coherent Multiprocessors
Author(s)
Agarwal, Anant; Kranz, David; Natarajan, Venkat
DownloadMIT-LCS-TM-481.pdf (1.045Mb)
Metadata
Show full item recordAbstract
This paper presents a theoretical framework for automatically partitioning parallel loops to minimize cache coherency traffic on shared-memory multiprocessors. The framework introduces the notion of uniformly intersecting references to capture temporal locality in array references, and the idea of data footprints to estimate the communication traffic between processors. The framework uses lattice theory to compute the size of data footprints. We demonstrate that algorithms based on our framework discover optimal partitions in many cases, such as non-communication-free parallelogram partitions of affine loop index functions, which were not handled by previous algorithms. We also show that our framework correctly reproduces results from previous loop partitioning algorithms proposed by Abraham and Hudak and by Sadayappan and Ramanujam. Because they deal only with index expressions, the algorithms are computationally efficient as well. We have implemented a subset of this framework for rectangular partitioning in a compiler for the cache-coherent Alewife machine.
Date issued
1992-12Series/Report no.
MIT-LCS-TM-481