memory.tex

\section{Memory Access Overhead}
Knowing the cache topology may still not be enough to optimize memory accesses.
Indeed, the previous tests do not take into account the fact that more than one
core could be trying to access memory at the same time.

The benchmark provided by Servet compares the bandwidth to memory when using
only one core and the one when using a pair of cores trying to concurrently
access memory. These bandwidth might be similar, in which case the memory
access overhead is negligible.

Knowing the pairs of cores which suffer from concurrent access to the memory,
autotuned applications should be able to make sure that the cores within each
pair avoid accessing the memory at the same time.

Servet does even more : it is able to determine the pairs of cores which are the
most affected by this overhead. This allows applications to know what memory
accesses are the most important to optimize.

\begin{figure}
    \center
    \includegraphics[width=8cm]{memory_overhead.png}
    \caption{Core pair of core 0. Finis Terrae : cores 0, 1, 2 and 3 share the
same bus to memory, which is why the bandwidth is quite low when cores 1, 2, 3
try to access memory at the same time as core 0. The bandwidth for cores 4, 5, 6
and 7 is not so good, which can be explained by the fact that these cores are
located in the same cell as core 0.} 
\end{figure}