Almost Fully Utilized Caches for Iterative Methods
Craig C. Douglas ({\tt douglas@ccs.uky.edu}) \\
Departments of Computer Science, Mechanical Engineering, \\
and Center for Computational Sciences \\
325 McVey Hall - CCS,
University of Kentucky \\
Lexington, KY 40506-0045, USA
\\ \\
Jonathan Hu ({\tt jhu@ms.uky.edu}) \\
University of Kentucky,
Department of Mathematics \\
715 Patterson Hall,
Lexington, KY 40506-0027, USA
\\ \\
Wolfgang Karl ({\tt karlw@in.tum.de}) \\
Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR-TUM)
Institut für Informatik,
Technische Universität München,
D-80290 München, Germany
\\ \\
Markus Kowarschik ({\tt kowarschik@informatik.uni-erlangen.de}) \\
Lehrstuhl für Systemsimulation (IMMD 10),
Institut für Informatik
Universität Erlangen-Nürnberg,
Martensstraße 3, D-91058 Erlangen, Germany
\\ \\
Ulrich Rüde ({\tt ruede@informatik.uni-erlangen.de}) \\
Lehrstuhl für Systemsimulation (IMMD 10),
Institut für Informatik
Universität Erlangen-Nürnberg,
Martensstraße 3, D-91058 Erlangen, Germany
\\ \\
Christian Weiß ({\tt weissc@in.tum.de}) \\
Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR-TUM)
Institut für Informatik,
Technische Universität München,
D-80290 München, Germany
Conventional implementations of iterative numerical algorithms, especially
multigrid methods, merely reach a disappointing small percentage of the
theoretically available CPU performance when applied to representative large
problems. One of the most important reasons for this phenomenon is that the
current DRAM technology is not able to provide the data fast enougth to keep
the CPU busy. Although the fundamentals of cache optimizations are quite
simple, current compilers are not able to optimize even simple iterative
methods.
We analyse the memory and cache behavior of iterative methods with extensive
profiling and describe program transformation techniques to improve the cache
performance of two and three dimensional multigrid algorithms. Examples are
provided for both structured and unstructured grid problems.