Automatic Partitioning of Parallel Loops for Cache-coherent Multiprocessors

Agarwal, Anant; Kranz, David; Natarajan, Venkat

Author(s)

Agarwal, Anant; Kranz, David; Natarajan, Venkat

DownloadMIT-LCS-TM-481.pdf (1.045Mb)

Metadata

Show full item record

Abstract

This paper presents a theoretical framework for automatically partitioning parallel loops to minimize cache coherency traffic on shared-memory multiprocessors. The framework introduces the notion of uniformly intersecting references to capture temporal locality in array references, and the idea of data footprints to estimate the communication traffic between processors. The framework uses lattice theory to compute the size of data footprints. We demonstrate that algorithms based on our framework discover optimal partitions in many cases, such as non-communication-free parallelogram partitions of affine loop index functions, which were not handled by previous algorithms. We also show that our framework correctly reproduces results from previous loop partitioning algorithms proposed by Abraham and Hudak and by Sadayappan and Ramanujam. Because they deal only with index expressions, the algorithms are computationally efficient as well. We have implemented a subset of this framework for rectangular partitioning in a compiler for the cache-coherent Alewife machine.

Date issued

1992-12

URI

https://hdl.handle.net/1721.1/149205

Series/Report no.

MIT-LCS-TM-481

Collections

LCS Technical Memos (1974 - 2003)