The best results for various kernels are given in~\prettyref{tbl:res-kernels}. The optimization binary \verb|roofline_full_manpack| was used for these results. This is the binary with all optimizations and the intrinsics kernel enabled. The following parameters were used: \verb|roofline_full_manpack -s 150000000 -r 5|. One double array was therefore 1144.41 MB big -- clearly too big for the cache. \begin{table}[h!] \centering \begin{tabular}{ll} \toprule Kernel & Max. GFLOP/s \\ \midrule simple16 & 0.9919 \\ fma16 & 0.9891 \\ simple8 & 123.4004 \\ simple8fastmath & 8.7187 \\ fma8 & 21.7866 \\ fma8manpack & 18.9066 \\ \bottomrule \end{tabular} \caption{Results for various kernels} \label{tbl:res-kernels} \end{table} %%% Local Variables: %%% mode: latex %%% TeX-master: "../report" %%% End: