According to the definition used the arithmetic intensity is measured by operations per byte. This might not be adequat for haswell processors (and later). Due to the fused multiply-add\footnote{although called multiply-add there are 36 different slightly instructions} extension two floating point operations can be performed with a single instruction. - worse results for 4 threads @ NUMA-STREAM not necessarily expected - better results for triad possibly due to combined storage in FMA - striding for arrays %%% Local Variables: %%% mode: latex %%% TeX-master: "../report" %%% End: