hpc/roofline/report/inputs/roofline.tex

34 lines
1.8 KiB
TeX

\subsection{Theoretical Peak Performance}
The CPU under test was a Intel\textregistered{} Core\texttrademark{} i5-4210U. \prettyref{tbl:spec-4210} shows the relevant specifications for this processor according to \textcite{ark4210}.
\begin{table}[h!]
\centering
\begin{tabular}{ll}
\toprule
Specification & Value \\
\midrule
Instruction Set Extension & SSE4.1/4.2, AVX 2.0 \\
\# of Cores & 2 \\
Processor Base Frequency & 1.7 GHz \\
Max Turbo Frequency & 2.7 GHz \\
Microarchitecture & Haswell \\
\bottomrule
\end{tabular}
\caption{Intel\textregistered{} Core\texttrademark{} i5-4210U processor specifications~\cite{ark4210}}
\label{tbl:spec-4210}
\end{table}
The 4th generation Intel Core processors provide FMA\footnote{Fused Multiply Add} and AVX\footnote{Advanced Vector Extension} extensions~\cite[5-2 Vol.1]{intel2016}. An FMA unit is capable of ``[...] 256-bit floating-point instructions to perform computation on
256-bit vectors''~\cite[5-28 Vol.1]{intel2016}. Therefore it can execute 2 (multiply-add) times 4 double-precision floating-point instructions each cycle. This results in 8 DP FLOPs per cycle.
Unfortunately no definite source could be found but according to \textcite{shimpi2012} the Haswell architecture has 2 FMA units, equalling to $2 * 8 = 16$ DP FLOPs per core. Furthermore there are 2 cores in a Core i5 processor. Taken together this results in $16 * 2 = 32$ DP FLOPs per cycle for both cores.
At max frequency the processor is therefore capable of a theoretical peak performance of $32*2.7 = 86.4$ GFLOP/s.
\printbibliography
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "../report"
%%% End: