diff --git a/reduce/reduce b/reduce/reduce new file mode 100755 index 0000000..0d1319a Binary files /dev/null and b/reduce/reduce differ diff --git a/roofline/report/inputs/discussion.tex b/roofline/report/inputs/discussion.tex index 0a19393..d6ec6fc 100644 --- a/roofline/report/inputs/discussion.tex +++ b/roofline/report/inputs/discussion.tex @@ -1,6 +1,8 @@ According to the definition used the arithmetic intensity is measured by operations per byte. This might not be adequat for haswell processors (and later). Due to the fused multiply-add\footnote{although called multiply-add there are 36 different slightly instructions} extension two floating point operations can be performed with a single instruction. - worse results for 4 threads @ NUMA-STREAM not necessarily expected +- better results for triad possibly due to combined storage in FMA +- striding for arrays %%% Local Variables: %%% mode: latex diff --git a/roofline/report/inputs/kernels.tex b/roofline/report/inputs/kernels.tex index 09e87b3..c69863b 100644 --- a/roofline/report/inputs/kernels.tex +++ b/roofline/report/inputs/kernels.tex @@ -1,3 +1,115 @@ +Kernels with operational intensity (OI) of $\rfrac{1}{16}$ and $8$ have been implemented. The kernels are introduced in the following sections. + +However the effective operational intensity of a given kernel in a high-level language (as C) is not obvious when compiled to processor instructions. Furthermore, due to today's advanced processor architecture, adaptions had to be made to account for special capabilites. This resulted in several different kernels. Not all of them are machine independent with regard to operational intensity. + +All kernels were compiled with \verb|gcc 5.3.1| and different options. The compilation was checked with \verb|objdump -d -M intel-mnemonics|. For a more elaborate analysis of the disassembly on the testers computer, please refer to the header file \verb|aikern.h| that should come with this report. Additionally \verb|Makefile| provides all informations about the used and tested compiler options. + +Good results\footnote{all, including the special FMA kernels, use only expected memory access, doing everything else in registers} were achieved with \verb|-O2 -mavx -mfma|. But \verb|-O2 -maxv -mfma| is a tradeoff between the best possible results and obviously correct compiled code. In fact the assembly almost looks like handwritten. If even more optimization is wanted \verb|-O3| can be used. To fully utilize FMA with packed doubles \verb|-Ofast| or \verb|-Ofast -ffast-math| has to be used. Be aware that more optimization than \verb|-O2 -maxv -mfma| results in a very hard to understand disassembly. \verb|-ffast-math| can even introduce rounding errors. It is not completely obvious that the highly optimized compiled code still has the wanted operational intensity. \verb|-O0| never works out. + +\bigskip +\bigskip +\begin{footnotesize} +\noindent\emph{Remark:} Contrary to popular believe the roofline model is built atop the notion of operational intensity\footnote{FLOPs against bytes written to DRAM} kernels. The differences to arithmetic intensities are outlined in~\textcite{williams2009}. Depending on the definition used these two terms are not necessarily interchangeable. The notion of operational intensity in the following sections might be what some would understand by the term arithmetic itensity. +\end{footnotesize} + +\subsection{1/16 $\neq$ 1/16. Or: The Fancy Arithmetics of a Compiler} + +In order to understand why the following kernels are implemented the way they are, an example of a badly behaving $\rfrac{1}{16}$ OI kernel is given in~\prettyref{lst:1-16-simple-dangerous}. The kernel has one FP operation ($*$) and reads 16 bytes (a[i], b[i]) from memory. But in practice this algorithm does not work as expected. There are several ways how one could write the same kernel. + +\begin{itemize} +\item Submitting \verb|volatile|. This results in the loop being optimized away completely for optimization levels above \verb|-O0|. +\item Using no optimization i.e. \verb|-O0|. No advanced features of the processor will be used (e.g., FMA requires at least \verb|-O2|). Also just about everything is read and written from and to the stack. Even loop variables. One may now assume that this is cached anyway --- or one ain't so. +\item Using \verb|volatile| and optimization. When volatile is used gcc reads and writes variable \verb|tmp| from and to the stack, even in \verb|-O3|. If tmp is cached or not is hard to predict. It's not improbable but relying on that assumption can yield wrong results. +\item Using \verb|register|, \verb|volatile| and optimization. Unfortunately \verb|register| just \emph{advises} the compiler to use a register. It does not force the compiler to do so. Seemingly \verb|volatile| overrules \verb|register| in this case -- \verb|tmp| is read and written from and to the stack. Again assuming any caching behaviour is adventurous at least. +\end{itemize} + +In the worst case found (no optimization, no volatile, no register) this results in reads of 16 bytes (a[i],b[i]) plus 8 bytes (i), and writes of 16 bytes (i, tmp assignment). Making no caching assumptions this results in an effective operational intensity of $\rfrac{1}{40}$ for a superficial $\rfrac{1}{16}$ OI kernel. For more complex kernels the results get even worse. A triad \verb|t=a*b+c| will store easy-to-miss intermediate results on the stack if no special care is taken. + +To prevent this, one could write assembly directly or rely on compiler intrinsics. The kernels in this report though consist just of normal C code which was hand-crafted until an acceptable compilation was reached. The generated machine code was disassembled and manually checked for hidden memory access. The results are therefore compiler and machine specific, but should be quite generalizable for the most part. + +\bigskip +\begin{lstlisting}[caption={Simple $\rfrac{1}{16}$ kernel with questionable compiled form}, label=lst:1-16-simple-dangerous] +volatile register double tmp = 0.1; +for(size_t i=0; iintel2016 intel2016 intel2016 - intel2 + intelvfmadd132pd shimpi2012 shimpi2012 berstrom bergstrom2 williams2009 + williams2009 + williams2009 + intelvfmadd132sd + intelvfmadd132pd diff --git a/roofline/report/report.blg b/roofline/report/report.blg index 93cbf64..0acc198 100644 --- a/roofline/report/report.blg +++ b/roofline/report/report.blg @@ -1,14 +1,14 @@ [0] Config.pm:318> INFO - This is Biber 1.8 [0] Config.pm:321> INFO - Logfile is 'report.blg' -[71] biber:275> INFO - === Thu Jun 23, 2016, 02:25:38 -[71] Biber.pm:333> INFO - Reading 'report.bcf' -[152] Biber.pm:630> INFO - Found 7 citekeys in bib section 0 -[164] Biber.pm:3053> INFO - Processing section 0 -[187] Biber.pm:3190> INFO - Looking for bibtex format file 'roofline.bib' for section 0 -[188] bibtex.pm:937> INFO - Decoding LaTeX character macros into UTF-8 -[189] bibtex.pm:812> INFO - Found BibTeX data source 'roofline.bib' -[236] Biber.pm:2939> INFO - Overriding locale 'en_US.UTF-8' default tailoring 'variable = shifted' with 'variable = non-ignorable' -[236] Biber.pm:2945> INFO - Sorting 'entry' list 'nty' keys -[236] Biber.pm:2949> INFO - No sort tailoring available for locale 'en_US.UTF-8' -[252] bbl.pm:482> INFO - Writing 'report.bbl' with encoding 'UTF-8' -[254] bbl.pm:555> INFO - Output to report.bbl +[53] biber:275> INFO - === Thu Jun 23, 2016, 19:53:59 +[53] Biber.pm:333> INFO - Reading 'report.bcf' +[116] Biber.pm:630> INFO - Found 8 citekeys in bib section 0 +[127] Biber.pm:3053> INFO - Processing section 0 +[142] Biber.pm:3190> INFO - Looking for bibtex format file 'roofline.bib' for section 0 +[144] bibtex.pm:937> INFO - Decoding LaTeX character macros into UTF-8 +[144] bibtex.pm:812> INFO - Found BibTeX data source 'roofline.bib' +[179] Biber.pm:2939> INFO - Overriding locale 'en_US.UTF-8' default tailoring 'variable = shifted' with 'variable = non-ignorable' +[179] Biber.pm:2945> INFO - Sorting 'entry' list 'nty' keys +[179] Biber.pm:2949> INFO - No sort tailoring available for locale 'en_US.UTF-8' +[197] bbl.pm:482> INFO - Writing 'report.bbl' with encoding 'UTF-8' +[198] bbl.pm:555> INFO - Output to report.bbl diff --git a/roofline/report/report.fdb_latexmk b/roofline/report/report.fdb_latexmk index 67f1e53..aa0cc97 100644 --- a/roofline/report/report.fdb_latexmk +++ b/roofline/report/report.fdb_latexmk @@ -1,11 +1,11 @@ # Fdb version 3 -["biber report"] 1466641538 "report.bcf" "report.bbl" "report" 1466642333 - "report.bcf" 1466642333 92144 b16bb4d23ff7f0d4a3e0ee2f3a7b2c36 "" - "roofline.bib" 1466632630 3723 5c74ca6da23b4936d86117884f95cb33 "" +["biber report"] 1466704438 "report.bcf" "report.bbl" "report" 1466709195 + "report.bcf" 1466709195 92382 2683b542d57d2326e3b37a6a44222b52 "" + "roofline.bib" 1466704433 4157 226e47c750579a202f66b6f0e4df67bb "" (generated) "report.bbl" "report.blg" -["pdflatex"] 1466642332 "report.tex" "report.pdf" "report" 1466642333 +["pdflatex"] 1466709193 "report.tex" "report.pdf" "report" 1466709195 "/usr/share/texlive/texmf-dist/fonts/enc/dvips/cm-super/cm-super-t1.enc" 1136849721 2971 def0b6c1f0b107b3b936def894055589 "" "/usr/share/texlive/texmf-dist/fonts/enc/dvips/cm-super/cm-super-ts1.enc" 1136849721 2900 1537cc8184ad1792082cd229ecc269f4 "" "/usr/share/texlive/texmf-dist/fonts/map/fontname/texfonts.map" 1272929888 3287 e6b82fe08f5336d4d5ebc73fb1152e87 "" @@ -20,6 +20,7 @@ "/usr/share/texlive/texmf-dist/fonts/tfm/jknappen/ec/ecrm0900.tfm" 1136768653 3584 d3d8ac8b25ca19c0a40b86a5db1e8ccc "" "/usr/share/texlive/texmf-dist/fonts/tfm/jknappen/ec/ecrm1095.tfm" 1136768653 3584 929cdff2b7a8c11bd4d49fd68cb0ae70 "" "/usr/share/texlive/texmf-dist/fonts/tfm/jknappen/ec/ecrm1440.tfm" 1136768653 3584 3169d30142b88a27d4ab0e3468e963a2 "" + "/usr/share/texlive/texmf-dist/fonts/tfm/jknappen/ec/ecti0900.tfm" 1136768653 3072 a603fa6d934ebc72197ed1c389943d86 "" "/usr/share/texlive/texmf-dist/fonts/tfm/jknappen/ec/ecti1095.tfm" 1136768653 3072 b73d2778cc3af44970de4de5e032d7f6 "" "/usr/share/texlive/texmf-dist/fonts/tfm/jknappen/ec/ectt1095.tfm" 1136768653 1536 a988bfe554c1f79514bd46d13c3c64ce "" "/usr/share/texlive/texmf-dist/fonts/tfm/jknappen/ec/tcrm1095.tfm" 1136768653 1536 02c06700a42be0f5a28664c7273f82e7 "" @@ -56,6 +57,7 @@ "/usr/share/texlive/texmf-dist/fonts/tfm/public/stmaryrd/stmary9.tfm" 1302307949 848 594c171945930dfc7cc52fb30457c803 "" "/usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi10.pfb" 1248133631 36299 5f9df58c2139e7edcf37c8fca4bd384d "" "/usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr10.pfb" 1248133631 35752 024fb6c41858982481f6968b5fc26508 "" + "/usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr12.pfb" 1248133631 32722 d7379af29a190c3f453aba36302ff5a9 "" "/usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr8.pfb" 1248133631 32726 0a1aea6fcd6468ee2cf64d891f5c43c8 "" "/usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmsy10.pfb" 1248133631 32569 5e5ddc8df908dea60932f3c484a54c0d "" "/usr/share/texlive/texmf-dist/fonts/type1/public/cm-super/sfbx1095.pfb" 1215737283 154600 ea54091d31de803b613ba9e80ca51709 "" @@ -68,6 +70,7 @@ "/usr/share/texlive/texmf-dist/fonts/type1/public/cm-super/sfrm0900.pfb" 1215737283 149037 995a6f1e12c1d647b99b1cf55db78699 "" "/usr/share/texlive/texmf-dist/fonts/type1/public/cm-super/sfrm1095.pfb" 1215737283 145929 f25e56369a345c4ff583b067cd87ce8e "" "/usr/share/texlive/texmf-dist/fonts/type1/public/cm-super/sfrm1440.pfb" 1215737283 131078 d96015a2fa5c350129e933ca070b2484 "" + "/usr/share/texlive/texmf-dist/fonts/type1/public/cm-super/sfti0900.pfb" 1215737283 183673 6df73819bb3e1246a6315a4913a2d331 "" "/usr/share/texlive/texmf-dist/fonts/type1/public/cm-super/sfti1095.pfb" 1215737283 196446 8fbbe4b97b83e5182def6d29a44e57fb "" "/usr/share/texlive/texmf-dist/fonts/type1/public/cm-super/sftt1095.pfb" 1215737283 169670 48d12e69c9a3b23c81f6d703ccbd4554 "" "/usr/share/texlive/texmf-dist/tex/context/base/supp-pdf.mkii" 1337017135 71627 94eb9990bed73c364d7f53f960cc8c5b "" @@ -158,8 +161,6 @@ "/usr/share/texlive/texmf-dist/tex/latex/listings/listings.cfg" 1394061314 1828 1429ae58d32ff215bffb2acf697ae41a "" "/usr/share/texlive/texmf-dist/tex/latex/listings/listings.sty" 1394061314 80361 048fe35275a1096660ea67eecd2213f4 "" "/usr/share/texlive/texmf-dist/tex/latex/listings/lstlang1.sty" 1394061314 93168 df9863fadbf023e458067a158925eff9 "" - "/usr/share/texlive/texmf-dist/tex/latex/listings/lstlang2.sty" 1394061314 89980 e97cebbc4f0eae4011a8bea389a05d0a "" - "/usr/share/texlive/texmf-dist/tex/latex/listings/lstlang3.sty" 1394061314 86841 4fa558f6bbd8f3d49e175c0dd27ff41a "" "/usr/share/texlive/texmf-dist/tex/latex/listings/lstmisc.sty" 1394061314 77029 dfe676ac1c76cfa220c8107472a1da27 "" "/usr/share/texlive/texmf-dist/tex/latex/logreq/logreq.def" 1284153563 1620 fb1c32b818f2058eca187e5c41dfae77 "" "/usr/share/texlive/texmf-dist/tex/latex/logreq/logreq.sty" 1284153563 6187 b27afc771af565d3a9ff1ca7d16d0d46 "" @@ -195,21 +196,22 @@ "/usr/share/texlive/texmf-dist/web2c/texmf.cnf" 1455657841 31706 2be2b4306fae7fc20493e3b90c2ad04d "" "/usr/share/texlive/texmf-var/web2c/pdftex/pdflatex.fmt" 1457104667 3492982 6abaa3262ef9227a797168d32888676c "" "inputs/introduction.tex" 1466184626 76 eaf0f76fa74815989416f6f6d1c36f8b "" - "inputs/kernels.tex" 1466184646 75 4edfbf753fb138c9886dd119053949bf "" - "inputs/roofline.tex" 1466642331 5522 4541d608767a130965ef6af1061bff79 "" - "report.aux" 1466642333 3974 fbce129a17c9c0f39751b7114db01f4a "" - "report.bbl" 1466641538 6814 69377a156548dd41d6fce56d0861beda "biber report" - "report.out" 1466642333 334 a1cec9b42f1ecf30af112fc058dd7354 "" - "report.run.xml" 1466642333 2317 80d7743117fafc51b1e42b536d793f68 "" - "report.tex" 1466626391 4716 59f1e8b52a6969670880343126dbe52a "" - "report.toc" 1466642333 818 cfda5e6b9084ed337791b495536ef0b7 "" - "res/rooftop.png" 1466641296 38798 e83f8157e0a63985f174d5bf3128cc98 "" + "inputs/kernels.tex" 1466709193 10203 9325fb415b03bebae73e25df31a02d19 "" + "inputs/roofline.tex" 1466704311 5532 18ef3b0c3e19883f9e4d68e1bd73b31b "" + "report.aux" 1466709195 6200 ef3b9dffee45c82bc6071014e35f58b8 "" + "report.bbl" 1466704439 7655 4b5f697a70789470cde9f922b6440ee7 "biber report" + "report.out" 1466709195 566 365a3bdfdb786abd7e70ca003f732afb "" + "report.run.xml" 1466709195 2317 80d7743117fafc51b1e42b536d793f68 "" + "report.tex" 1466708164 4496 4af727a449506efbfccac9df327ec9fe "" + "report.toc" 1466709195 1210 9050233c7a77a885db53f60f534c1c7a "" + "res/rooftop-eps-converted-to.pdf" 1466670002 22114 f6f2c1d53d8b6a5f4042e202648c7b36 "" + "res/rooftop.eps" 1466669975 36013 2a6358f72820d80a6e87ee15e92d5669 "" (generated) - "report-blx.bib" "report.out" "report.toc" - "report.log" - "report.run.xml" "report.bcf" - "report.pdf" + "report.run.xml" + "report-blx.bib" + "report.log" "report.aux" + "report.pdf" diff --git a/roofline/report/report.fls b/roofline/report/report.fls index 0745291..dc0ecc7 100644 --- a/roofline/report/report.fls +++ b/roofline/report/report.fls @@ -218,22 +218,6 @@ INPUT /usr/share/texlive/texmf-dist/tex/latex/base/ts1enc.dfu INPUT /usr/share/texlive/texmf-dist/tex/latex/base/ts1enc.dfu INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang1.sty INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang1.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang2.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang2.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang3.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang3.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang1.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang1.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang2.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang2.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang3.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang3.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang1.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang1.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang2.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang2.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang3.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang3.sty INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstmisc.sty INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstmisc.sty INPUT /usr/share/texlive/texmf-dist/tex/latex/biblatex/lbx/english.lbx @@ -345,15 +329,25 @@ INPUT /usr/share/texlive/texmf-dist/fonts/tfm/public/amsfonts/symbols/msbm10.tfm INPUT /usr/share/texlive/texmf-dist/fonts/tfm/public/amsfonts/symbols/msbm5.tfm INPUT /usr/share/texlive/texmf-dist/fonts/tfm/public/stmaryrd/stmary9.tfm INPUT /usr/share/texlive/texmf-dist/fonts/tfm/public/stmaryrd/stmary5.tfm -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang1.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstlang1.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstmisc.sty -INPUT /usr/share/texlive/texmf-dist/tex/latex/listings/lstmisc.sty -INPUT res/rooftop.png -INPUT ./res/rooftop.png -INPUT ./res/rooftop.png +INPUT res/rooftop.eps +INPUT ./res/rooftop.eps +INPUT ./res/rooftop.eps +INPUT ./res/rooftop-eps-converted-to.pdf +INPUT ./res/rooftop-eps-converted-to.pdf +INPUT ./res/rooftop.eps +INPUT ./res/rooftop-eps-converted-to.pdf +INPUT ./res/rooftop-eps-converted-to.pdf +INPUT ./res/rooftop-eps-converted-to.pdf INPUT inputs/kernels.tex INPUT inputs/kernels.tex +INPUT /usr/share/texlive/texmf-dist/fonts/tfm/jknappen/ec/ecti0900.tfm +INPUT /usr/share/texlive/texmf-dist/fonts/tfm/public/cm/cmr12.tfm +INPUT /usr/share/texlive/texmf-dist/fonts/tfm/public/cm/cmmi12.tfm +INPUT /usr/share/texlive/texmf-dist/fonts/tfm/public/cm/cmsy10.tfm +INPUT /usr/share/texlive/texmf-dist/fonts/tfm/public/cm/cmex10.tfm +INPUT /usr/share/texlive/texmf-dist/fonts/tfm/public/amsfonts/symbols/msam10.tfm +INPUT /usr/share/texlive/texmf-dist/fonts/tfm/public/amsfonts/symbols/msbm10.tfm +INPUT /usr/share/texlive/texmf-dist/fonts/tfm/public/stmaryrd/stmary10.tfm INPUT /usr/share/texlive/texmf-dist/fonts/tfm/jknappen/ec/eccc1095.tfm INPUT /usr/share/texlive/texmf-dist/fonts/tfm/jknappen/ec/tcti1095.tfm INPUT report.aux @@ -365,6 +359,7 @@ INPUT /usr/share/texlive/texmf-dist/fonts/enc/dvips/cm-super/cm-super-ts1.enc INPUT /usr/share/texlive/texmf-dist/fonts/enc/dvips/cm-super/cm-super-t1.enc INPUT /usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi10.pfb INPUT /usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr10.pfb +INPUT /usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr12.pfb INPUT /usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmr8.pfb INPUT /usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmsy10.pfb INPUT /usr/share/texlive/texmf-dist/fonts/type1/public/cm-super/sfbx1095.pfb @@ -377,5 +372,6 @@ INPUT /usr/share/texlive/texmf-dist/fonts/type1/public/cm-super/sfrm0800.pfb INPUT /usr/share/texlive/texmf-dist/fonts/type1/public/cm-super/sfrm0900.pfb INPUT /usr/share/texlive/texmf-dist/fonts/type1/public/cm-super/sfrm1095.pfb INPUT /usr/share/texlive/texmf-dist/fonts/type1/public/cm-super/sfrm1440.pfb +INPUT /usr/share/texlive/texmf-dist/fonts/type1/public/cm-super/sfti0900.pfb INPUT /usr/share/texlive/texmf-dist/fonts/type1/public/cm-super/sfti1095.pfb INPUT /usr/share/texlive/texmf-dist/fonts/type1/public/cm-super/sftt1095.pfb diff --git a/roofline/report/report.log b/roofline/report/report.log index 9c6e2da..7f036bc 100644 --- a/roofline/report/report.log +++ b/roofline/report/report.log @@ -1,4 +1,4 @@ -This is pdfTeX, Version 3.14159265-2.6-1.40.15 (TeX Live 2014) (preloaded format=pdflatex 2016.3.4) 23 JUN 2016 02:38 +This is pdfTeX, Version 3.14159265-2.6-1.40.15 (TeX Live 2014) (preloaded format=pdflatex 2016.3.4) 23 JUN 2016 21:13 entering extended mode restricted \write18 enabled. %&-line parsing enabled. @@ -1184,30 +1184,6 @@ Package textcomp Info: Setting ptmj sub-encoding to TS1/4 on input line 340. (/usr/share/texlive/texmf-dist/tex/latex/listings/lstlang1.sty File: lstlang1.sty 2014/03/04 1.5c listings language file ) -(/usr/share/texlive/texmf-dist/tex/latex/listings/lstlang2.sty -File: lstlang2.sty 2014/03/04 1.5c listings language file -) -(/usr/share/texlive/texmf-dist/tex/latex/listings/lstlang3.sty -File: lstlang3.sty 2014/03/04 1.5c listings language file -) -(/usr/share/texlive/texmf-dist/tex/latex/listings/lstlang1.sty -File: lstlang1.sty 2014/03/04 1.5c listings language file -) -(/usr/share/texlive/texmf-dist/tex/latex/listings/lstlang2.sty -File: lstlang2.sty 2014/03/04 1.5c listings language file -) -(/usr/share/texlive/texmf-dist/tex/latex/listings/lstlang3.sty -File: lstlang3.sty 2014/03/04 1.5c listings language file -) -(/usr/share/texlive/texmf-dist/tex/latex/listings/lstlang1.sty -File: lstlang1.sty 2014/03/04 1.5c listings language file -) -(/usr/share/texlive/texmf-dist/tex/latex/listings/lstlang2.sty -File: lstlang2.sty 2014/03/04 1.5c listings language file -) -(/usr/share/texlive/texmf-dist/tex/latex/listings/lstlang3.sty -File: lstlang3.sty 2014/03/04 1.5c listings language file -) (/usr/share/texlive/texmf-dist/tex/latex/listings/lstmisc.sty File: lstmisc.sty 2014/03/04 1.5c (Carsten Heinz) ) @@ -1226,26 +1202,26 @@ Package biblatex Warning: 'babel/polyglossia' detected but 'csquotes' missing. (./report.aux) \openout1 = `report.aux'. -LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 81. -LaTeX Font Info: ... okay on input line 81. -LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 81. -LaTeX Font Info: ... okay on input line 81. -LaTeX Font Info: Checking defaults for OT1/cmr/m/n on input line 81. -LaTeX Font Info: ... okay on input line 81. -LaTeX Font Info: Checking defaults for OMS/cmsy/m/n on input line 81. -LaTeX Font Info: ... okay on input line 81. -LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 81. -LaTeX Font Info: ... okay on input line 81. -LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 81. -LaTeX Font Info: ... okay on input line 81. -LaTeX Font Info: Checking defaults for PD1/pdf/m/n on input line 81. -LaTeX Font Info: ... okay on input line 81. -LaTeX Font Info: Checking defaults for TS1/cmr/m/n on input line 81. -LaTeX Font Info: Try loading font information for TS1+cmr on input line 81. +LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 80. +LaTeX Font Info: ... okay on input line 80. +LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 80. +LaTeX Font Info: ... okay on input line 80. +LaTeX Font Info: Checking defaults for OT1/cmr/m/n on input line 80. +LaTeX Font Info: ... okay on input line 80. +LaTeX Font Info: Checking defaults for OMS/cmsy/m/n on input line 80. +LaTeX Font Info: ... okay on input line 80. +LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 80. +LaTeX Font Info: ... okay on input line 80. +LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 80. +LaTeX Font Info: ... okay on input line 80. +LaTeX Font Info: Checking defaults for PD1/pdf/m/n on input line 80. +LaTeX Font Info: ... okay on input line 80. +LaTeX Font Info: Checking defaults for TS1/cmr/m/n on input line 80. +LaTeX Font Info: Try loading font information for TS1+cmr on input line 80. (/usr/share/texlive/texmf-dist/tex/latex/base/ts1cmr.fd File: ts1cmr.fd 1999/05/25 v2.5h Standard LaTeX font definitions ) -LaTeX Font Info: ... okay on input line 81. +LaTeX Font Info: ... okay on input line 80. (/usr/share/texlive/texmf-dist/tex/context/base/supp-pdf.mkii [Loading MPS to PDF converter (version 2006.09.02).] @@ -1276,7 +1252,7 @@ File: epstopdf-sys.cfg 2010/07/13 v1.3 Configuration of (r)epstopdf for TeX Liv e )) \c@lstlisting=\count308 -LaTeX Info: Redefining \microtypecontext on input line 81. +LaTeX Info: Redefining \microtypecontext on input line 80. Package microtype Info: Generating PDF output. Package microtype Info: Character protrusion enabled (level 2). Package microtype Info: Using default protrusion set `alltext'. @@ -1292,7 +1268,7 @@ File: mt-cmr.cfg 2013/05/19 v2.2 microtype config. file: Computer Modern Roman (RS) ) \AtBeginShipoutBox=\box37 -Package hyperref Info: Link coloring ON on input line 81. +Package hyperref Info: Link coloring ON on input line 80. (/usr/share/texlive/texmf-dist/tex/latex/hyperref/nameref.sty Package: nameref 2012/10/27 v2.43 Cross-referencing by name of section @@ -1302,9 +1278,9 @@ Package: gettitlestring 2010/12/03 v1.4 Cleanup title references (HO) ) \c@section@level=\count309 ) -LaTeX Info: Redefining \ref on input line 81. -LaTeX Info: Redefining \pageref on input line 81. -LaTeX Info: Redefining \nameref on input line 81. +LaTeX Info: Redefining \ref on input line 80. +LaTeX Info: Redefining \pageref on input line 80. +LaTeX Info: Redefining \nameref on input line 80. (./report.out) (./report.out) \@outlinefile=\write4 @@ -1316,7 +1292,7 @@ Package lastpage Info: Please have a look at the pageslts package at (lastpage) or (lastpage) http://www.ctan.org/tex-archive/ (lastpage) install/macros/latex/contrib/pageslts.tds.zip -(lastpage) ! on input line 81. +(lastpage) ! on input line 80. Package caption Info: Begin \AtBeginDocument code. Package caption Info: End \AtBeginDocument code. Package biblatex Info: Input encoding 'utf8' detected. @@ -1327,9 +1303,9 @@ Package biblatex Info: Automatic encoding selection. Package biblatex Info: Trying to load bibliographic data... Package biblatex Info: ... file 'report.bbl' found. (./report.bbl) -Package biblatex Info: Reference section=0 on input line 81. -Package biblatex Info: Reference segment=0 on input line 81. -LaTeX Font Info: Try loading font information for U+msa on input line 91. +Package biblatex Info: Reference section=0 on input line 80. +Package biblatex Info: Reference segment=0 on input line 80. +LaTeX Font Info: Try loading font information for U+msa on input line 90. (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/umsa.fd File: umsa.fd 2013/01/14 v3.01 AMS symbols A @@ -1337,7 +1313,7 @@ File: umsa.fd 2013/01/14 v3.01 AMS symbols A (/usr/share/texlive/texmf-dist/tex/latex/microtype/mt-msa.cfg File: mt-msa.cfg 2006/02/04 v1.1 microtype config. file: AMS symbols (a) (RS) ) -LaTeX Font Info: Try loading font information for U+msb on input line 91. +LaTeX Font Info: Try loading font information for U+msb on input line 90. (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/umsb.fd File: umsb.fd 2013/01/14 v3.01 AMS symbols B @@ -1345,10 +1321,10 @@ File: umsb.fd 2013/01/14 v3.01 AMS symbols B (/usr/share/texlive/texmf-dist/tex/latex/microtype/mt-msb.cfg File: mt-msb.cfg 2005/06/01 v1.0 microtype config. file: AMS symbols (b) (RS) ) -LaTeX Font Info: Try loading font information for U+stmry on input line 91. +LaTeX Font Info: Try loading font information for U+stmry on input line 90. (/usr/share/texlive/texmf-dist/tex/latex/stmaryrd/Ustmry.fd) -Package tocbasic Info: character protrusion at toc deactivated on input line 96 +Package tocbasic Info: character protrusion at toc deactivated on input line 95 . (./report.toc) \tf@toc=\write5 @@ -1372,68 +1348,89 @@ Package microtype Info: Loading generic settings for font family (microtype) For optimal results, create family-specific settings. (microtype) See the microtype manual for details. [2] -(/usr/share/texlive/texmf-dist/tex/latex/listings/lstlang1.sty -File: lstlang1.sty 2014/03/04 1.5c listings language file -) -(/usr/share/texlive/texmf-dist/tex/latex/listings/lstmisc.sty -File: lstmisc.sty 2014/03/04 1.5c (Carsten Heinz) -) - -File: res/rooftop.png Graphic file (type png) - -Package pdftex.def Info: res/rooftop.png used on input line 70. -(pdftex.def) Requested size: 358.50612pt x 270.20964pt. -) -[3] (./inputs/kernels.tex) -Overfull \hbox (19.7725pt too wide) in paragraph at lines 117--117 +Package epstopdf Info: Source file: +(epstopdf) date: 2016-06-23 10:19:35 +(epstopdf) size: 36013 bytes +(epstopdf) Output file: +(epstopdf) date: 2016-06-23 10:20:02 +(epstopdf) size: 22114 bytes +(epstopdf) Command: +(epstopdf) \includegraphics on input line 70. +Package epstopdf Info: Output file is already uptodate. + + +File: res/rooftop-eps-converted-to.pdf Graphic file (type pdf) + + +Package pdftex.def Info: res/rooftop-eps-converted-to.pdf used on input line 70 +. +(pdftex.def) Requested size: 358.50612pt x 270.25478pt. +) [3] (./inputs/kernels.tex [4 <./res/rooftop-eps-converted-to.pdf>] + +Package hyperref Warning: Token not allowed in a PDF string (PDFDocEncoding): +(hyperref) removing `math shift' on input line 15. + + +Package hyperref Warning: Token not allowed in a PDF string (PDFDocEncoding): +(hyperref) removing `\not' on input line 15. + + +Package hyperref Warning: Token not allowed in a PDF string (PDFDocEncoding): +(hyperref) removing `math shift' on input line 15. + +[5] [6]) +Overfull \hbox (19.7725pt too wide) in paragraph at lines 116--116 \T1/cmtt/m/n/10.95 blob / e5aa9ca4a77623ff6f1c2d5daa7995565b944506 / stream . c # L286$[][] \T1/cmr/m/n/10.95 (-20) (vis-ited on 06/20/2016). [] - +[7] AED: lastpage setting LastPage -[4 <./res/rooftop.png>] -Package atveryend Info: Empty hook `BeforeClearDocument' on input line 118. -Package atveryend Info: Empty hook `AfterLastShipout' on input line 118. +[8] +Package atveryend Info: Empty hook `BeforeClearDocument' on input line 117. +Package atveryend Info: Empty hook `AfterLastShipout' on input line 117. (./report.aux) -Package atveryend Info: Executing hook `AtVeryEndDocument' on input line 118. -Package atveryend Info: Executing hook `AtEndAfterFileList' on input line 118. +Package atveryend Info: Executing hook `AtVeryEndDocument' on input line 117. +Package atveryend Info: Executing hook `AtEndAfterFileList' on input line 117. Package rerunfilecheck Info: File `report.out' has not changed. -(rerunfilecheck) Checksum: A1CEC9B42F1ECF30AF112FC058DD7354;334. +(rerunfilecheck) Checksum: 365A3BDFDB786ABD7E70CA003F732AFB;566. Package logreq Info: Writing requests to 'report.run.xml'. \openout1 = `report.run.xml'. -Package atveryend Info: Empty hook `AtVeryVeryEnd' on input line 118. +Package atveryend Info: Empty hook `AtVeryVeryEnd' on input line 117. ) Here is how much of TeX's memory you used: - 22318 strings out of 493339 - 351456 string characters out of 6141383 - 896953 words of memory out of 5000000 - 25280 multiletter control sequences out of 15000+600000 - 25899 words of font info for 101 fonts, out of 8000000 for 9000 + 21436 strings out of 493339 + 338721 string characters out of 6141383 + 878402 words of memory out of 5000000 + 24309 multiletter control sequences out of 15000+600000 + 29876 words of font info for 133 fonts, out of 8000000 for 9000 953 hyphenation exceptions out of 8191 - 59i,8n,122p,1066b,1944s stack positions out of 5000i,500n,10000p,200000b,80000s + 48i,8n,76p,1008b,1880s stack positions out of 5000i,500n,10000p,200000b,80000s {/usr/share/texlive/texmf-dist/fonts/enc/dvips/cm-super/cm-super-ts1.enc}{/us r/share/texlive/texmf-dist/fonts/enc/dvips/cm-super/cm-super-t1.enc}< -/usr/share/texlive/texmf-dist/fonts/type1/public/cm-super/sfrm0800.pfb> -Output written on report.pdf (4 pages, 290073 bytes). +f-dist/fonts/type1/public/amsfonts/cm/cmr12.pfb>< +/usr/share/texlive/texmf-dist/fonts/type1/public/cm-super/sfrm0600.pfb> +Output written on report.pdf (8 pages, 328260 bytes). PDF statistics: - 192 PDF objects out of 1000 (max. 8388607) - 164 compressed objects within 2 object streams - 31 named destinations out of 1000 (max. 500000) - 22070 words of extra memory for PDF output out of 24883 (max. 10000000) + 353 PDF objects out of 1000 (max. 8388607) + 278 compressed objects within 3 object streams + 81 named destinations out of 1000 (max. 500000) + 26190 words of extra memory for PDF output out of 29859 (max. 10000000) diff --git a/roofline/report/report.pdf b/roofline/report/report.pdf index a007d26..aba6179 100644 Binary files a/roofline/report/report.pdf and b/roofline/report/report.pdf differ diff --git a/roofline/report/report.tex b/roofline/report/report.tex index f0662fc..7688648 100644 --- a/roofline/report/report.tex +++ b/roofline/report/report.tex @@ -46,6 +46,8 @@ \newrefformat{sec}{\hyperref[#1]{Section~\ref*{#1}}} \renewcommand{\arraystretch}{1.2} +\newcommand*\rfrac[2]{{}^{#1}\!/_{#2}}%running fraction with slash - requires math mode + \newcommand\bigforall{\mbox{\Large $\mathsurround0pt\forall$}} \everymath{\displaystyle} @@ -59,7 +61,7 @@ extendedchars=true, % lets you use non-ASCII characters; for 8-bits encodings only, does not work with UTF-8 frame=single, % adds a frame around the code keepspaces=true, % keeps spaces in text, useful for keeping indentation of code (possibly needs columns=flexible) - language=TeX, % the language of the code + language=C, % the language of the code numbers=left, % where to put the line-numbers; possible values are (none, left, right) numbersep=5pt, % how far the line-numbers are from the code numberstyle=\tiny\color{gray}, % the style that is used for the line-numbers @@ -70,12 +72,9 @@ stepnumber=1, % the step between two line-numbers. If it's 1, each line will be numbered tabsize=2, % sets default tabsize to 2 spaces title=\lstname, % show the filename of files included with \lstinputlisting; also try caption instead of title - emph=[3]{int:,array,set,of,int,if,then,else,constraint,var,union,endif,function,where,in,div,predicate,let,opt,full,format,def,for,True,False,return,or}, - emphstyle=[3]\color{ForestGreen}, - emph=[2]{length,max,forall,startEmptyBuffer,fix,startEmptyBufferShow,exactly,cumulative,occurs,deopt,sum,,all}, - emphstyle=[2]\color{blue}, - commentstyle=\color{BrickRed}, - stringstyle =\color{red}, + keywordstyle=\color{blue}, + morekeywords={size_t}, + commentstyle=\color{ForestGreen} } \begin{document} diff --git a/roofline/report/report.toc b/roofline/report/report.toc index 32f06ea..31e6e11 100644 --- a/roofline/report/report.toc +++ b/roofline/report/report.toc @@ -13,3 +13,9 @@ \contentsline {subsection}{\numberline {2.3}Graph}{3}{subsection.2.3} \defcounter {refsection}{0}\relax \contentsline {section}{\numberline {3}Kernels}{4}{section.3} +\defcounter {refsection}{0}\relax +\contentsline {subsection}{\numberline {3.1}1/16 $\not =$ 1/16. Or: The Fancy Arithmetics of a Compiler}{5}{subsection.3.1} +\defcounter {refsection}{0}\relax +\contentsline {subsection}{\numberline {3.2}The 1/16 OI Kernel}{6}{subsection.3.2} +\defcounter {refsection}{0}\relax +\contentsline {subsection}{\numberline {3.3}The 8 OI Kernel}{6}{subsection.3.3} diff --git a/roofline/report/res/rooftop-eps-converted-to.pdf b/roofline/report/res/rooftop-eps-converted-to.pdf new file mode 100644 index 0000000..09ab90b Binary files /dev/null and b/roofline/report/res/rooftop-eps-converted-to.pdf differ diff --git a/roofline/report/res/rooftop.eps b/roofline/report/res/rooftop.eps new file mode 100644 index 0000000..9830f4b --- /dev/null +++ b/roofline/report/res/rooftop.eps @@ -0,0 +1,2350 @@ +%!PS-Adobe-3.0 EPSF-3.0 +%%Title: /home/armin/dev/hpc/roofline/report/res/rooftop.eps +%%Creator: matplotlib version 1.4.3, http://matplotlib.org/ +%%CreationDate: Thu Jun 23 10:19:35 2016 +%%Orientation: portrait +%%BoundingBox: 13 175 598 616 +%%EndComments +%%BeginProlog +/mpldict 8 dict def +mpldict begin +/m { moveto } bind def +/l { lineto } bind def +/r { rlineto } bind def +/c { curveto } bind def +/cl { closepath } bind def +/box { +m +1 index 0 r +0 exch r +neg 0 r +cl +} bind def +/clipbox { +box +clip +newpath +} bind def +%!PS-Adobe-3.0 Resource-Font +%%Title: Bitstream Vera Sans +%%Copyright: Copyright (c) 2003 by Bitstream, Inc. All Rights Reserved. +%%Creator: Converted from TrueType to type 3 by PPR +25 dict begin +/_d{bind def}bind def +/_m{moveto}_d +/_l{lineto}_d +/_cl{closepath eofill}_d +/_c{curveto}_d +/_sc{7 -1 roll{setcachedevice}{pop pop pop pop pop pop}ifelse}_d +/_e{exec}_d +/FontName /BitstreamVeraSans-Roman def +/PaintType 0 def +/FontMatrix[.001 0 0 .001 0 0]def +/FontBBox[-183 -236 1287 928]def +/FontType 3 def +/Encoding [ /space /parenleft /parenright /hyphen /slash /zero /one /two /three /four /six /eight /A /B /F /G /I /L /M /O /P /a /b /c /d /e /f /g /h /i /k /l /m /n /o /p /r /s /t /w /y ] def +/FontInfo 10 dict dup begin +/FamilyName (Bitstream Vera Sans) def +/FullName (Bitstream Vera Sans) def +/Notice (Copyright (c) 2003 by Bitstream, Inc. All Rights Reserved. Bitstream Vera is a trademark of Bitstream, Inc.) def +/Weight (Roman) def +/Version (Release 1.10) def +/ItalicAngle 0.0 def +/isFixedPitch false def +/UnderlinePosition -213 def +/UnderlineThickness 143 def +end readonly def +/CharStrings 41 dict dup begin +/space{318 0 0 0 0 0 _sc +}_d +/parenleft{390 0 86 -131 310 759 _sc +310 759 _m +266 683 234 609 213 536 _c +191 463 181 389 181 314 _c +181 238 191 164 213 91 _c +234 17 266 -56 310 -131 _c +232 -131 _l +183 -54 146 20 122 94 _c +98 168 86 241 86 314 _c +86 386 98 459 122 533 _c +146 607 182 682 232 759 _c +310 759 _l +_cl}_d +/parenright{390 0 80 -131 304 759 _sc +80 759 _m +158 759 _l +206 682 243 607 267 533 _c +291 459 304 386 304 314 _c +304 241 291 168 267 94 _c +243 20 206 -54 158 -131 _c +80 -131 _l +123 -56 155 17 177 91 _c +198 164 209 238 209 314 _c +209 389 198 463 177 536 _c +155 609 123 683 80 759 _c +_cl}_d +/hyphen{361 0 49 234 312 314 _sc +49 314 _m +312 314 _l +312 234 _l +49 234 _l +49 314 _l +_cl}_d +/slash{337 0 0 -92 337 729 _sc +254 729 _m +337 729 _l +83 -92 _l +0 -92 _l +254 729 _l +_cl}_d +/zero{636 0 66 -13 570 742 _sc +318 664 _m +267 664 229 639 203 589 _c +177 539 165 464 165 364 _c +165 264 177 189 203 139 _c +229 89 267 64 318 64 _c +369 64 407 89 433 139 _c +458 189 471 264 471 364 _c +471 464 458 539 433 589 _c +407 639 369 664 318 664 _c +318 742 _m +399 742 461 709 505 645 _c +548 580 570 486 570 364 _c +570 241 548 147 505 83 _c +461 19 399 -13 318 -13 _c +236 -13 173 19 130 83 _c +87 147 66 241 66 364 _c +66 486 87 580 130 645 _c +173 709 236 742 318 742 _c +_cl}_d +/one{636 0 110 0 544 729 _sc +124 83 _m +285 83 _l +285 639 _l +110 604 _l +110 694 _l +284 729 _l +383 729 _l +383 83 _l +544 83 _l +544 0 _l +124 0 _l +124 83 _l +_cl}_d +/two{{636 0 73 0 536 742 _sc +192 83 _m +536 83 _l +536 0 _l +73 0 _l +73 83 _l +110 121 161 173 226 239 _c +290 304 331 346 348 365 _c +380 400 402 430 414 455 _c +426 479 433 504 433 528 _c +433 566 419 598 392 622 _c +365 646 330 659 286 659 _c +255 659 222 653 188 643 _c +154 632 117 616 78 594 _c +78 694 _l +118 710 155 722 189 730 _c +223 738 255 742 284 742 _c +}_e{359 742 419 723 464 685 _c +509 647 532 597 532 534 _c +532 504 526 475 515 449 _c +504 422 484 390 454 354 _c +446 344 420 317 376 272 _c +332 227 271 164 192 83 _c +_cl}_e}_d +/three{{636 0 76 -13 556 742 _sc +406 393 _m +453 383 490 362 516 330 _c +542 298 556 258 556 212 _c +556 140 531 84 482 45 _c +432 6 362 -13 271 -13 _c +240 -13 208 -10 176 -4 _c +144 1 110 10 76 22 _c +76 117 _l +103 101 133 89 166 81 _c +198 73 232 69 268 69 _c +330 69 377 81 409 105 _c +441 129 458 165 458 212 _c +458 254 443 288 413 312 _c +383 336 341 349 287 349 _c +}_e{202 349 _l +202 430 _l +291 430 _l +339 430 376 439 402 459 _c +428 478 441 506 441 543 _c +441 580 427 609 401 629 _c +374 649 336 659 287 659 _c +260 659 231 656 200 650 _c +169 644 135 635 98 623 _c +98 711 _l +135 721 170 729 203 734 _c +235 739 266 742 296 742 _c +370 742 429 725 473 691 _c +517 657 539 611 539 553 _c +539 513 527 479 504 451 _c +481 423 448 403 406 393 _c +_cl}_e}_d +/four{636 0 49 0 580 729 _sc +378 643 _m +129 254 _l +378 254 _l +378 643 _l +352 729 _m +476 729 _l +476 254 _l +580 254 _l +580 172 _l +476 172 _l +476 0 _l +378 0 _l +378 172 _l +49 172 _l +49 267 _l +352 729 _l +_cl}_d +/six{{636 0 70 -13 573 742 _sc +330 404 _m +286 404 251 388 225 358 _c +199 328 186 286 186 234 _c +186 181 199 139 225 109 _c +251 79 286 64 330 64 _c +374 64 409 79 435 109 _c +461 139 474 181 474 234 _c +474 286 461 328 435 358 _c +409 388 374 404 330 404 _c +526 713 _m +526 623 _l +501 635 476 644 451 650 _c +425 656 400 659 376 659 _c +310 659 260 637 226 593 _c +}_e{192 549 172 482 168 394 _c +187 422 211 444 240 459 _c +269 474 301 482 336 482 _c +409 482 467 459 509 415 _c +551 371 573 310 573 234 _c +573 159 550 99 506 54 _c +462 9 403 -13 330 -13 _c +246 -13 181 19 137 83 _c +92 147 70 241 70 364 _c +70 479 97 571 152 639 _c +206 707 280 742 372 742 _c +396 742 421 739 447 735 _c +472 730 498 723 526 713 _c +_cl}_e}_d +/eight{{636 0 68 -13 568 742 _sc +318 346 _m +271 346 234 333 207 308 _c +180 283 167 249 167 205 _c +167 161 180 126 207 101 _c +234 76 271 64 318 64 _c +364 64 401 76 428 102 _c +455 127 469 161 469 205 _c +469 249 455 283 429 308 _c +402 333 365 346 318 346 _c +219 388 _m +177 398 144 418 120 447 _c +96 476 85 511 85 553 _c +85 611 105 657 147 691 _c +188 725 245 742 318 742 _c +}_e{390 742 447 725 489 691 _c +530 657 551 611 551 553 _c +551 511 539 476 515 447 _c +491 418 459 398 417 388 _c +464 377 501 355 528 323 _c +554 291 568 251 568 205 _c +568 134 546 80 503 43 _c +459 5 398 -13 318 -13 _c +237 -13 175 5 132 43 _c +89 80 68 134 68 205 _c +68 251 81 291 108 323 _c +134 355 171 377 219 388 _c +183 544 _m +183 506 194 476 218 455 _c +}_e{242 434 275 424 318 424 _c +360 424 393 434 417 455 _c +441 476 453 506 453 544 _c +453 582 441 611 417 632 _c +393 653 360 664 318 664 _c +275 664 242 653 218 632 _c +194 611 183 582 183 544 _c +_cl}_e}_d +/A{684 0 8 0 676 729 _sc +342 632 _m +208 269 _l +476 269 _l +342 632 _l +286 729 _m +398 729 _l +676 0 _l +573 0 _l +507 187 _l +178 187 _l +112 0 _l +8 0 _l +286 729 _l +_cl}_d +/B{{686 0 98 0 615 729 _sc +197 348 _m +197 81 _l +355 81 _l +408 81 447 92 473 114 _c +498 136 511 169 511 215 _c +511 260 498 293 473 315 _c +447 337 408 348 355 348 _c +197 348 _l +197 648 _m +197 428 _l +343 428 _l +391 428 426 437 450 455 _c +474 473 486 500 486 538 _c +486 574 474 602 450 620 _c +426 638 391 648 343 648 _c +197 648 _l +98 729 _m +350 729 _l +}_e{425 729 483 713 524 682 _c +564 650 585 606 585 549 _c +585 504 574 468 553 442 _c +532 416 502 399 462 393 _c +510 382 548 360 575 327 _c +601 294 615 253 615 204 _c +615 138 592 88 548 53 _c +504 17 441 0 360 0 _c +98 0 _l +98 729 _l +_cl}_e}_d +/F{575 0 98 0 517 729 _sc +98 729 _m +517 729 _l +517 646 _l +197 646 _l +197 431 _l +486 431 _l +486 348 _l +197 348 _l +197 0 _l +98 0 _l +98 729 _l +_cl}_d +/G{{775 0 56 -13 693 742 _sc +595 104 _m +595 300 _l +434 300 _l +434 381 _l +693 381 _l +693 68 _l +655 40 613 20 567 7 _c +521 -6 472 -13 420 -13 _c +306 -13 216 20 152 86 _c +88 152 56 245 56 364 _c +56 482 88 575 152 642 _c +216 708 306 742 420 742 _c +467 742 512 736 555 724 _c +598 712 638 695 674 673 _c +674 568 _l +637 598 598 621 557 637 _c +516 653 473 661 428 661 _c +}_e{338 661 271 636 227 586 _c +182 536 160 462 160 364 _c +160 265 182 191 227 141 _c +271 91 338 67 428 67 _c +462 67 493 70 521 76 _c +549 82 573 91 595 104 _c +_cl}_e}_d +/I{295 0 98 0 197 729 _sc +98 729 _m +197 729 _l +197 0 _l +98 0 _l +98 729 _l +_cl}_d +/L{557 0 98 0 552 729 _sc +98 729 _m +197 729 _l +197 83 _l +552 83 _l +552 0 _l +98 0 _l +98 729 _l +_cl}_d +/M{863 0 98 0 765 729 _sc +98 729 _m +245 729 _l +431 233 _l +618 729 _l +765 729 _l +765 0 _l +669 0 _l +669 640 _l +481 140 _l +382 140 _l +194 640 _l +194 0 _l +98 0 _l +98 729 _l +_cl}_d +/O{787 0 56 -13 731 742 _sc +394 662 _m +322 662 265 635 223 582 _c +181 528 160 456 160 364 _c +160 272 181 199 223 146 _c +265 92 322 66 394 66 _c +465 66 522 92 564 146 _c +606 199 627 272 627 364 _c +627 456 606 528 564 582 _c +522 635 465 662 394 662 _c +394 742 _m +496 742 577 707 639 639 _c +700 571 731 479 731 364 _c +731 248 700 157 639 89 _c +577 21 496 -13 394 -13 _c +291 -13 209 21 148 89 _c +86 157 56 248 56 364 _c +56 479 86 571 148 639 _c +209 707 291 742 394 742 _c +_cl}_d +/P{603 0 98 0 569 729 _sc +197 648 _m +197 374 _l +321 374 _l +367 374 402 385 427 409 _c +452 433 465 467 465 511 _c +465 555 452 588 427 612 _c +402 636 367 648 321 648 _c +197 648 _l +98 729 _m +321 729 _l +402 729 464 710 506 673 _c +548 636 569 582 569 511 _c +569 439 548 384 506 348 _c +464 311 402 293 321 293 _c +197 293 _l +197 0 _l +98 0 _l +98 729 _l +_cl}_d +/a{{613 0 60 -13 522 560 _sc +343 275 _m +270 275 220 266 192 250 _c +164 233 150 205 150 165 _c +150 133 160 107 181 89 _c +202 70 231 61 267 61 _c +317 61 357 78 387 114 _c +417 149 432 196 432 255 _c +432 275 _l +343 275 _l +522 312 _m +522 0 _l +432 0 _l +432 83 _l +411 49 385 25 355 10 _c +325 -5 287 -13 243 -13 _c +187 -13 142 2 109 33 _c +76 64 60 106 60 159 _c +}_e{60 220 80 266 122 298 _c +163 329 224 345 306 345 _c +432 345 _l +432 354 _l +432 395 418 427 391 450 _c +364 472 326 484 277 484 _c +245 484 215 480 185 472 _c +155 464 127 453 100 439 _c +100 522 _l +132 534 164 544 195 550 _c +226 556 256 560 286 560 _c +365 560 424 539 463 498 _c +502 457 522 395 522 312 _c +_cl}_e}_d +/b{{635 0 91 -13 580 760 _sc +487 273 _m +487 339 473 390 446 428 _c +418 466 381 485 334 485 _c +286 485 249 466 222 428 _c +194 390 181 339 181 273 _c +181 207 194 155 222 117 _c +249 79 286 61 334 61 _c +381 61 418 79 446 117 _c +473 155 487 207 487 273 _c +181 464 _m +199 496 223 520 252 536 _c +281 552 316 560 356 560 _c +422 560 476 533 518 481 _c +559 428 580 359 580 273 _c +}_e{580 187 559 117 518 65 _c +476 13 422 -13 356 -13 _c +316 -13 281 -5 252 10 _c +223 25 199 49 181 82 _c +181 0 _l +91 0 _l +91 760 _l +181 760 _l +181 464 _l +_cl}_e}_d +/c{{550 0 55 -13 488 560 _sc +488 526 _m +488 442 _l +462 456 437 466 411 473 _c +385 480 360 484 334 484 _c +276 484 230 465 198 428 _c +166 391 150 339 150 273 _c +150 206 166 154 198 117 _c +230 80 276 62 334 62 _c +360 62 385 65 411 72 _c +437 79 462 90 488 104 _c +488 21 _l +462 9 436 0 410 -5 _c +383 -10 354 -13 324 -13 _c +242 -13 176 12 128 64 _c +}_e{79 115 55 185 55 273 _c +55 362 79 432 128 483 _c +177 534 244 560 330 560 _c +358 560 385 557 411 551 _c +437 545 463 537 488 526 _c +_cl}_e}_d +/d{{635 0 55 -13 544 760 _sc +454 464 _m +454 760 _l +544 760 _l +544 0 _l +454 0 _l +454 82 _l +435 49 411 25 382 10 _c +353 -5 319 -13 279 -13 _c +213 -13 159 13 117 65 _c +75 117 55 187 55 273 _c +55 359 75 428 117 481 _c +159 533 213 560 279 560 _c +319 560 353 552 382 536 _c +411 520 435 496 454 464 _c +148 273 _m +148 207 161 155 188 117 _c +215 79 253 61 301 61 _c +}_e{348 61 385 79 413 117 _c +440 155 454 207 454 273 _c +454 339 440 390 413 428 _c +385 466 348 485 301 485 _c +253 485 215 466 188 428 _c +161 390 148 339 148 273 _c +_cl}_e}_d +/e{{615 0 55 -13 562 560 _sc +562 296 _m +562 252 _l +149 252 _l +153 190 171 142 205 110 _c +238 78 284 62 344 62 _c +378 62 412 66 444 74 _c +476 82 509 95 541 113 _c +541 28 _l +509 14 476 3 442 -3 _c +408 -9 373 -13 339 -13 _c +251 -13 182 12 131 62 _c +80 112 55 181 55 268 _c +55 357 79 428 127 481 _c +175 533 241 560 323 560 _c +397 560 455 536 498 489 _c +}_e{540 441 562 377 562 296 _c +472 322 _m +471 371 457 410 431 440 _c +404 469 368 484 324 484 _c +274 484 234 469 204 441 _c +174 413 156 373 152 322 _c +472 322 _l +_cl}_e}_d +/f{352 0 23 0 371 760 _sc +371 760 _m +371 685 _l +285 685 _l +253 685 230 678 218 665 _c +205 652 199 629 199 595 _c +199 547 _l +347 547 _l +347 477 _l +199 477 _l +199 0 _l +109 0 _l +109 477 _l +23 477 _l +23 547 _l +109 547 _l +109 585 _l +109 645 123 690 151 718 _c +179 746 224 760 286 760 _c +371 760 _l +_cl}_d +/g{{635 0 55 -207 544 560 _sc +454 280 _m +454 344 440 395 414 431 _c +387 467 349 485 301 485 _c +253 485 215 467 188 431 _c +161 395 148 344 148 280 _c +148 215 161 165 188 129 _c +215 93 253 75 301 75 _c +349 75 387 93 414 129 _c +440 165 454 215 454 280 _c +544 68 _m +544 -24 523 -93 482 -139 _c +440 -184 377 -207 292 -207 _c +260 -207 231 -204 203 -200 _c +175 -195 147 -188 121 -178 _c +}_e{121 -91 _l +147 -105 173 -115 199 -122 _c +225 -129 251 -133 278 -133 _c +336 -133 380 -117 410 -87 _c +439 -56 454 -10 454 52 _c +454 96 _l +435 64 411 40 382 24 _c +353 8 319 0 279 0 _c +211 0 157 25 116 76 _c +75 127 55 195 55 280 _c +55 364 75 432 116 483 _c +157 534 211 560 279 560 _c +319 560 353 552 382 536 _c +411 520 435 496 454 464 _c +454 547 _l +544 547 _l +}_e{544 68 _l +_cl}_e}_d +/h{634 0 91 0 549 760 _sc +549 330 _m +549 0 _l +459 0 _l +459 327 _l +459 379 448 417 428 443 _c +408 469 378 482 338 482 _c +289 482 251 466 223 435 _c +195 404 181 362 181 309 _c +181 0 _l +91 0 _l +91 760 _l +181 760 _l +181 462 _l +202 494 227 519 257 535 _c +286 551 320 560 358 560 _c +420 560 468 540 500 501 _c +532 462 549 405 549 330 _c +_cl}_d +/i{278 0 94 0 184 760 _sc +94 547 _m +184 547 _l +184 0 _l +94 0 _l +94 547 _l +94 760 _m +184 760 _l +184 646 _l +94 646 _l +94 760 _l +_cl}_d +/k{579 0 91 0 576 760 _sc +91 760 _m +181 760 _l +181 311 _l +449 547 _l +564 547 _l +274 291 _l +576 0 _l +459 0 _l +181 267 _l +181 0 _l +91 0 _l +91 760 _l +_cl}_d +/l{278 0 94 0 184 760 _sc +94 760 _m +184 760 _l +184 0 _l +94 0 _l +94 760 _l +_cl}_d +/m{{974 0 91 0 889 560 _sc +520 442 _m +542 482 569 511 600 531 _c +631 550 668 560 711 560 _c +767 560 811 540 842 500 _c +873 460 889 403 889 330 _c +889 0 _l +799 0 _l +799 327 _l +799 379 789 418 771 444 _c +752 469 724 482 686 482 _c +639 482 602 466 575 435 _c +548 404 535 362 535 309 _c +535 0 _l +445 0 _l +445 327 _l +445 379 435 418 417 444 _c +398 469 369 482 331 482 _c +}_e{285 482 248 466 221 435 _c +194 404 181 362 181 309 _c +181 0 _l +91 0 _l +91 547 _l +181 547 _l +181 462 _l +201 495 226 520 255 536 _c +283 552 317 560 357 560 _c +397 560 430 550 458 530 _c +486 510 506 480 520 442 _c +_cl}_e}_d +/n{634 0 91 0 549 560 _sc +549 330 _m +549 0 _l +459 0 _l +459 327 _l +459 379 448 417 428 443 _c +408 469 378 482 338 482 _c +289 482 251 466 223 435 _c +195 404 181 362 181 309 _c +181 0 _l +91 0 _l +91 547 _l +181 547 _l +181 462 _l +202 494 227 519 257 535 _c +286 551 320 560 358 560 _c +420 560 468 540 500 501 _c +532 462 549 405 549 330 _c +_cl}_d +/o{612 0 55 -13 557 560 _sc +306 484 _m +258 484 220 465 192 427 _c +164 389 150 338 150 273 _c +150 207 163 156 191 118 _c +219 80 257 62 306 62 _c +354 62 392 80 420 118 _c +448 156 462 207 462 273 _c +462 337 448 389 420 427 _c +392 465 354 484 306 484 _c +306 560 _m +384 560 445 534 490 484 _c +534 433 557 363 557 273 _c +557 183 534 113 490 63 _c +445 12 384 -13 306 -13 _c +227 -13 165 12 121 63 _c +77 113 55 183 55 273 _c +55 363 77 433 121 484 _c +165 534 227 560 306 560 _c +_cl}_d +/p{{635 0 91 -207 580 560 _sc +181 82 _m +181 -207 _l +91 -207 _l +91 547 _l +181 547 _l +181 464 _l +199 496 223 520 252 536 _c +281 552 316 560 356 560 _c +422 560 476 533 518 481 _c +559 428 580 359 580 273 _c +580 187 559 117 518 65 _c +476 13 422 -13 356 -13 _c +316 -13 281 -5 252 10 _c +223 25 199 49 181 82 _c +487 273 _m +487 339 473 390 446 428 _c +418 466 381 485 334 485 _c +}_e{286 485 249 466 222 428 _c +194 390 181 339 181 273 _c +181 207 194 155 222 117 _c +249 79 286 61 334 61 _c +381 61 418 79 446 117 _c +473 155 487 207 487 273 _c +_cl}_e}_d +/r{411 0 91 0 411 560 _sc +411 463 _m +401 469 390 473 378 476 _c +366 478 353 480 339 480 _c +288 480 249 463 222 430 _c +194 397 181 350 181 288 _c +181 0 _l +91 0 _l +91 547 _l +181 547 _l +181 462 _l +199 495 224 520 254 536 _c +284 552 321 560 365 560 _c +371 560 378 559 386 559 _c +393 558 401 557 411 555 _c +411 463 _l +_cl}_d +/s{{521 0 54 -13 472 560 _sc +443 531 _m +443 446 _l +417 458 391 468 364 475 _c +336 481 308 485 279 485 _c +234 485 200 478 178 464 _c +156 450 145 430 145 403 _c +145 382 153 366 169 354 _c +185 342 217 330 265 320 _c +296 313 _l +360 299 405 279 432 255 _c +458 230 472 195 472 151 _c +472 100 452 60 412 31 _c +372 1 316 -13 246 -13 _c +216 -13 186 -10 154 -5 _c +}_e{122 0 89 8 54 20 _c +54 113 _l +87 95 120 82 152 74 _c +184 65 216 61 248 61 _c +290 61 323 68 346 82 _c +368 96 380 117 380 144 _c +380 168 371 187 355 200 _c +339 213 303 226 247 238 _c +216 245 _l +160 257 119 275 95 299 _c +70 323 58 356 58 399 _c +58 450 76 490 112 518 _c +148 546 200 560 268 560 _c +301 560 332 557 362 552 _c +391 547 418 540 443 531 _c +}_e{_cl}_e}_d +/t{392 0 27 0 368 702 _sc +183 702 _m +183 547 _l +368 547 _l +368 477 _l +183 477 _l +183 180 _l +183 135 189 106 201 94 _c +213 81 238 75 276 75 _c +368 75 _l +368 0 _l +276 0 _l +206 0 158 13 132 39 _c +106 65 93 112 93 180 _c +93 477 _l +27 477 _l +27 547 _l +93 547 _l +93 702 _l +183 702 _l +_cl}_d +/w{818 0 42 0 776 547 _sc +42 547 _m +132 547 _l +244 120 _l +356 547 _l +462 547 _l +574 120 _l +686 547 _l +776 547 _l +633 0 _l +527 0 _l +409 448 _l +291 0 _l +185 0 _l +42 547 _l +_cl}_d +/y{592 0 30 -207 562 547 _sc +322 -50 _m +296 -114 271 -157 247 -177 _c +223 -197 191 -207 151 -207 _c +79 -207 _l +79 -132 _l +132 -132 _l +156 -132 175 -126 189 -114 _c +203 -102 218 -75 235 -31 _c +251 9 _l +30 547 _l +125 547 _l +296 119 _l +467 547 _l +562 547 _l +322 -50 _l +_cl}_d +end readonly def + +/BuildGlyph + {exch begin + CharStrings exch + 2 copy known not{pop /.notdef}if + true 3 1 roll get exec + end}_d + +/BuildChar { + 1 index /Encoding get exch get + 1 index /BuildGlyph get exec +}_d + +FontName currentdict end definefont pop +end +%%EndProlog +mpldict begin +13.5 175.5 translate +585 441 0 0 clipbox +gsave +0 0 m +585 0 l +585 441 l +0 441 l +cl +1.000 setgray +fill +grestore +gsave +73.125 44.1 m +526.5 44.1 l +526.5 396.9 l +73.125 396.9 l +cl +0.898 setgray +fill +grestore +0.500 setlinewidth +1 setlinejoin +2 setlinecap +[] 0 setdash +1.000 setgray +gsave +453.4 352.8 73.12 44.1 clipbox +73.125 44.1 m +73.125 396.9 l +stroke +grestore +0 setlinecap +0.333 setgray +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +0 -4 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 44.1 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +0 4 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 396.9 o +grestore +/BitstreamVeraSans-Roman findfont +10.000 scalefont +setfont +gsave +62.945312 28.506250 translate +0.000000 rotate +0.000000 0.000000 m /one glyphshow +6.362305 0.000000 m /slash glyphshow +9.731445 0.000000 m /three glyphshow +16.093750 0.000000 m /two glyphshow +grestore +2 setlinecap +1.000 setgray +gsave +453.4 352.8 73.12 44.1 clipbox +155.557 44.1 m +155.557 396.9 l +stroke +grestore +0 setlinecap +0.333 setgray +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +0 -4 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +155.557 44.1 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +0 4 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +155.557 396.9 o +grestore +gsave +148.392756 28.506250 translate +0.000000 rotate +0.000000 0.000000 m /one glyphshow +6.362305 0.000000 m /slash glyphshow +9.731445 0.000000 m /eight glyphshow +grestore +2 setlinecap +1.000 setgray +gsave +453.4 352.8 73.12 44.1 clipbox +237.989 44.1 m +237.989 396.9 l +stroke +grestore +0 setlinecap +0.333 setgray +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +0 -4 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +237.989 44.1 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +0 4 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +237.989 396.9 o +grestore +gsave +230.988636 28.506250 translate +0.000000 rotate +0.000000 0.000000 m /one glyphshow +6.362305 0.000000 m /slash glyphshow +9.731445 0.000000 m /two glyphshow +grestore +2 setlinecap +1.000 setgray +gsave +453.4 352.8 73.12 44.1 clipbox +320.42 44.1 m +320.42 396.9 l +stroke +grestore +0 setlinecap +0.333 setgray +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +0 -4 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +320.42 44.1 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +0 4 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +320.42 396.9 o +grestore +gsave +318.107955 28.506250 translate +0.000000 rotate +0.000000 0.000000 m /two glyphshow +grestore +2 setlinecap +1.000 setgray +gsave +453.4 352.8 73.12 44.1 clipbox +402.852 44.1 m +402.852 396.9 l +stroke +grestore +0 setlinecap +0.333 setgray +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +0 -4 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +402.852 44.1 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +0 4 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +402.852 396.9 o +grestore +gsave +400.352273 28.506250 translate +0.000000 rotate +0.000000 0.000000 m /eight glyphshow +grestore +2 setlinecap +1.000 setgray +gsave +453.4 352.8 73.12 44.1 clipbox +485.284 44.1 m +485.284 396.9 l +stroke +grestore +0 setlinecap +0.333 setgray +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +0 -4 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +485.284 44.1 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +0 4 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +485.284 396.9 o +grestore +gsave +479.807528 28.506250 translate +0.000000 rotate +0.000000 0.000000 m /three glyphshow +6.362305 0.000000 m /two glyphshow +grestore +/BitstreamVeraSans-Roman findfont +12.000 scalefont +setfont +gsave +203.593750 12.303125 translate +0.000000 rotate +0.000000 0.000000 m /O glyphshow +9.445312 0.000000 m /p glyphshow +17.062500 0.000000 m /e glyphshow +24.445312 0.000000 m /r glyphshow +29.378906 0.000000 m /a glyphshow +36.732422 0.000000 m /t glyphshow +41.437500 0.000000 m /i glyphshow +44.771484 0.000000 m /o glyphshow +52.113281 0.000000 m /n glyphshow +59.718750 0.000000 m /a glyphshow +67.072266 0.000000 m /l glyphshow +70.406250 0.000000 m /space glyphshow +74.220703 0.000000 m /I glyphshow +77.759766 0.000000 m /t glyphshow +82.464844 0.000000 m /e glyphshow +89.847656 0.000000 m /n glyphshow +97.453125 0.000000 m /s glyphshow +103.705078 0.000000 m /i glyphshow +107.039062 0.000000 m /t glyphshow +111.744141 0.000000 m /y glyphshow +118.845703 0.000000 m /space glyphshow +122.660156 0.000000 m /parenleft glyphshow +127.341797 0.000000 m /F glyphshow +134.244141 0.000000 m /L glyphshow +140.554688 0.000000 m /O glyphshow +150.000000 0.000000 m /P glyphshow +157.236328 0.000000 m /slash glyphshow +161.279297 0.000000 m /B glyphshow +169.511719 0.000000 m /y glyphshow +176.613281 0.000000 m /t glyphshow +181.318359 0.000000 m /e glyphshow +188.701172 0.000000 m /parenright glyphshow +grestore +2 setlinecap +1.000 setgray +gsave +453.4 352.8 73.12 44.1 clipbox +73.125 44.1 m +526.5 44.1 l +stroke +grestore +0 setlinecap +0.333 setgray +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 44.1 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 44.1 o +grestore +/BitstreamVeraSans-Roman findfont +10.000 scalefont +setfont +gsave +60.078125 41.342187 translate +0.000000 rotate +0.000000 0.000000 m /zero glyphshow +grestore +2 setlinecap +1.000 setgray +gsave +453.4 352.8 73.12 44.1 clipbox +73.125 83.3 m +526.5 83.3 l +stroke +grestore +0 setlinecap +0.333 setgray +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 83.3 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 83.3 o +grestore +gsave +60.078125 80.542187 translate +0.000000 rotate +0.000000 0.000000 m /zero glyphshow +grestore +2 setlinecap +1.000 setgray +gsave +453.4 352.8 73.12 44.1 clipbox +73.125 122.5 m +526.5 122.5 l +stroke +grestore +0 setlinecap +0.333 setgray +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 122.5 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 122.5 o +grestore +gsave +60.781250 119.742187 translate +0.000000 rotate +0.000000 0.000000 m /one glyphshow +grestore +2 setlinecap +1.000 setgray +gsave +453.4 352.8 73.12 44.1 clipbox +73.125 161.7 m +526.5 161.7 l +stroke +grestore +0 setlinecap +0.333 setgray +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 161.7 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 161.7 o +grestore +gsave +60.500000 158.942187 translate +0.000000 rotate +0.000000 0.000000 m /two glyphshow +grestore +2 setlinecap +1.000 setgray +gsave +453.4 352.8 73.12 44.1 clipbox +73.125 200.9 m +526.5 200.9 l +stroke +grestore +0 setlinecap +0.333 setgray +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 200.9 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 200.9 o +grestore +gsave +59.812500 198.142187 translate +0.000000 rotate +0.000000 0.000000 m /four glyphshow +grestore +2 setlinecap +1.000 setgray +gsave +453.4 352.8 73.12 44.1 clipbox +73.125 240.1 m +526.5 240.1 l +stroke +grestore +0 setlinecap +0.333 setgray +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 240.1 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 240.1 o +grestore +gsave +60.125000 237.342187 translate +0.000000 rotate +0.000000 0.000000 m /eight glyphshow +grestore +2 setlinecap +1.000 setgray +gsave +453.4 352.8 73.12 44.1 clipbox +73.125 279.3 m +526.5 279.3 l +stroke +grestore +0 setlinecap +0.333 setgray +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 279.3 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 279.3 o +grestore +gsave +54.125000 276.542187 translate +0.000000 rotate +0.000000 0.000000 m /one glyphshow +6.362305 0.000000 m /six glyphshow +grestore +2 setlinecap +1.000 setgray +gsave +453.4 352.8 73.12 44.1 clipbox +73.125 318.5 m +526.5 318.5 l +stroke +grestore +0 setlinecap +0.333 setgray +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 318.5 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 318.5 o +grestore +gsave +54.171875 315.742188 translate +0.000000 rotate +0.000000 0.000000 m /three glyphshow +6.362305 0.000000 m /two glyphshow +grestore +2 setlinecap +1.000 setgray +gsave +453.4 352.8 73.12 44.1 clipbox +73.125 357.7 m +526.5 357.7 l +stroke +grestore +0 setlinecap +0.333 setgray +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 357.7 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 357.7 o +grestore +gsave +53.671875 354.942188 translate +0.000000 rotate +0.000000 0.000000 m /six glyphshow +6.362305 0.000000 m /four glyphshow +grestore +2 setlinecap +1.000 setgray +gsave +453.4 352.8 73.12 44.1 clipbox +73.125 396.9 m +526.5 396.9 l +stroke +grestore +0 setlinecap +0.333 setgray +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 396.9 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +4 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 396.9 o +grestore +gsave +47.812500 394.142187 translate +0.000000 rotate +0.000000 0.000000 m /one glyphshow +6.362305 0.000000 m /two glyphshow +12.724609 0.000000 m /eight glyphshow +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 44.1 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 44.1 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 83.3 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 83.3 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 122.5 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 122.5 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 161.7 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 161.7 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 200.9 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 200.9 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 240.1 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 240.1 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 279.3 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 279.3 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 318.5 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 318.5 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 357.7 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 357.7 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +-2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +73.125 396.9 o +grestore +gsave +/o { +gsave +newpath +translate +0.5 setlinewidth +1 setlinejoin +0 setlinecap +0 0 m +2 0 l +gsave +0.333 setgray +fill +grestore +stroke +grestore +} bind def +526.5 396.9 o +grestore +/BitstreamVeraSans-Roman findfont +12.000 scalefont +setfont +gsave +40.312500 163.453125 translate +90.000000 rotate +0.000000 0.000000 m /A glyphshow +7.958984 0.000000 m /t glyphshow +12.664062 0.000000 m /t glyphshow +17.369141 0.000000 m /a glyphshow +24.722656 0.000000 m /i glyphshow +28.056641 0.000000 m /n glyphshow +35.662109 0.000000 m /a glyphshow +43.015625 0.000000 m /b glyphshow +50.632812 0.000000 m /l glyphshow +53.966797 0.000000 m /e glyphshow +61.349609 0.000000 m /space glyphshow +65.164062 0.000000 m /G glyphshow +74.462891 0.000000 m /F glyphshow +81.365234 0.000000 m /L glyphshow +87.675781 0.000000 m /O glyphshow +97.121094 0.000000 m /P glyphshow +104.357422 0.000000 m /slash glyphshow +108.400391 0.000000 m /s glyphshow +grestore +1.000 setlinewidth +2 setlinecap +0.886 0.290 0.200 setrgbcolor +gsave +453.4 352.8 73.12 44.1 clipbox +402.852 374.672 m +444.068 374.672 l +485.284 374.672 l +526.5 374.672 l +stroke +grestore +0.204 0.541 0.741 setrgbcolor +gsave +453.4 352.8 73.12 44.1 clipbox +73.125 60.0149 m +114.341 99.2149 l +155.557 138.415 l +196.773 177.615 l +237.989 216.815 l +279.205 256.015 l +320.42 295.215 l +361.636 334.415 l +402.852 373.615 l +stroke +grestore +0 setlinejoin +1.000 setgray +gsave +73.125 396.9 m +526.5 396.9 l +stroke +grestore +gsave +73.125 44.1 m +73.125 396.9 l +stroke +grestore +gsave +526.5 44.1 m +526.5 396.9 l +stroke +grestore +gsave +73.125 44.1 m +526.5 44.1 l +stroke +grestore +0.500 setlinewidth +0 setlinecap +gsave +79.125 352.05 m +316.325 352.05 l +316.325 390.9 l +79.125 390.9 l +cl +gsave +0.898 setgray +fill +grestore +stroke +grestore +1.000 setlinewidth +1 setlinejoin +2 setlinecap +0.886 0.290 0.200 setrgbcolor +gsave +87.525 381.175 m +104.325 381.175 l +stroke +grestore +0.000 setgray +gsave +117.525000 376.975000 translate +0.000000 rotate +0.000000 0.000000 m /P glyphshow +6.861328 0.000000 m /e glyphshow +14.244141 0.000000 m /a glyphshow +21.597656 0.000000 m /k glyphshow +28.546875 0.000000 m /space glyphshow +32.361328 0.000000 m /F glyphshow +39.263672 0.000000 m /l glyphshow +42.597656 0.000000 m /o glyphshow +49.939453 0.000000 m /a glyphshow +57.292969 0.000000 m /t glyphshow +61.998047 0.000000 m /i glyphshow +65.332031 0.000000 m /n glyphshow +72.937500 0.000000 m /g glyphshow +80.554688 0.000000 m /hyphen glyphshow +84.884766 0.000000 m /P glyphshow +91.746094 0.000000 m /o glyphshow +99.087891 0.000000 m /i glyphshow +102.421875 0.000000 m /n glyphshow +110.027344 0.000000 m /t glyphshow +114.732422 0.000000 m /space glyphshow +118.546875 0.000000 m /P glyphshow +125.408203 0.000000 m /e glyphshow +132.791016 0.000000 m /r glyphshow +137.724609 0.000000 m /f glyphshow +141.949219 0.000000 m /o glyphshow +149.291016 0.000000 m /r glyphshow +153.974609 0.000000 m /m glyphshow +165.664062 0.000000 m /a glyphshow +173.017578 0.000000 m /n glyphshow +180.623047 0.000000 m /c glyphshow +187.220703 0.000000 m /e glyphshow +grestore +0.204 0.541 0.741 setrgbcolor +gsave +87.525 363.55 m +104.325 363.55 l +stroke +grestore +0.000 setgray +gsave +117.525000 359.350000 translate +0.000000 rotate +0.000000 0.000000 m /P glyphshow +6.861328 0.000000 m /e glyphshow +14.244141 0.000000 m /a glyphshow +21.597656 0.000000 m /k glyphshow +28.546875 0.000000 m /space glyphshow +32.361328 0.000000 m /M glyphshow +42.714844 0.000000 m /e glyphshow +50.097656 0.000000 m /m glyphshow +61.787109 0.000000 m /o glyphshow +69.128906 0.000000 m /r glyphshow +74.062500 0.000000 m /y glyphshow +81.164062 0.000000 m /space glyphshow +84.978516 0.000000 m /B glyphshow +93.210938 0.000000 m /a glyphshow +100.564453 0.000000 m /n glyphshow +108.169922 0.000000 m /d glyphshow +115.787109 0.000000 m /w glyphshow +125.601562 0.000000 m /i glyphshow +128.935547 0.000000 m /d glyphshow +136.552734 0.000000 m /t glyphshow +141.257812 0.000000 m /h glyphshow +grestore + +end +showpage diff --git a/roofline/report/res/rooftop.png b/roofline/report/res/rooftop.png deleted file mode 100644 index c68d1b0..0000000 Binary files a/roofline/report/res/rooftop.png and /dev/null differ diff --git a/roofline/report/roofline.bib b/roofline/report/roofline.bib index 6842fda..3b640b4 100644 --- a/roofline/report/roofline.bib +++ b/roofline/report/roofline.bib @@ -42,10 +42,20 @@ Timestamp = {2016.06.20} } -@Online{intel2, - Title = {Intel Intrinsics Guide}, +@Online{intelvfmadd132pd, + Title = {Intel Intrinsics Guide: vfmadd132pd}, Author = {{Intel}}, - Url = {https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=AVX2,FMA&text=madd&expand=2365}, + Url = {https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=AVX2,FMA&text=vfmadd132pd&expand=2365}, + Urldate = {2016-06-19}, + + Owner = {armin}, + Timestamp = {2016.06.22} +} + +@Online{intelvfmadd132sd, + Title = {Intel Intrinsics Guide: vfmadd132sd}, + Author = {{Intel}}, + Url = {https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=AVX2,FMA&text=vfmadd132sd&expand=2365,2403}, Urldate = {2016-06-19}, Owner = {armin}, diff --git a/roofline/src/Makefile b/roofline/src/Makefile index ac89864..8c2b285 100644 --- a/roofline/src/Makefile +++ b/roofline/src/Makefile @@ -1,7 +1,8 @@ -all: roofline roofline_avx roofline_o3avx roofline_o3 roofline_avxfma +all: roofline roofline_avx roofline_o3avx roofline_o3 roofline_avxfma roofline_avxfmafast +# Roofline Binary roofline: roofline.c aikern.a - gcc -Wall -Wextra -O3 -std=c99 -fopenmp $^ -o $@ + gcc -Wall -Wextra -std=c99 -fopenmp $^ -o $@ roofline_avx: roofline.c aikern_avx.a gcc -Wall -Wextra -O3 -std=c99 -fopenmp $^ -o $@ @@ -15,6 +16,10 @@ roofline_o3: roofline.c aikern_o3.a roofline_avxfma: roofline.c aikern_avxfma.a gcc -Wall -Wextra -O3 -std=c99 -fopenmp $^ -o $@ +roofline_avxfmafast: roofline.c aikern_avxfmafast.a + gcc -Wall -Wextra -O3 -std=c99 -fopenmp $^ -o $@ + +# Static Libraries aikern.a: aikern.c aikern.h gcc -c -o aikern.o aikern.c ar rcs aikern.a aikern.o @@ -36,8 +41,17 @@ aikern_avxfma.a: aikern.c aikern.h gcc -O2 -mavx -mfma -c -o aikern_avxfma.o aikern.c ar rcs aikern_avxfma.a aikern_avxfma.o +aikern_avxfmafast.a: aikern.c aikern.h + gcc -O2 -mavx -mfma -Ofast -c -o aikern_avxfmafastmath.o aikern.c + ar rcs aikern_avxfmafast.a aikern_avxfmafastmath.o + +aikern_avxfmafastmath.a: aikern.c aikern.h + gcc -O2 -mavx -mfma -Ofast -ffast-math -c -o aikern_avxfmafastmath.o aikern.c + ar rcs aikern_avxfmafast.a aikern_avxfmafastmath.o + + clean: - rm -f roofline roofline_avx roofline_o3avx roofline_o3 roofline_avxfma + rm -f roofline roofline_avx roofline_o3avx roofline_o3 roofline_avxfma roofline_avxfmafast rm -f *.o rm -f *.a rm -f *.so diff --git a/roofline/src/aikern.c b/roofline/src/aikern.c index e66d7f0..1ff84cb 100644 --- a/roofline/src/aikern.c +++ b/roofline/src/aikern.c @@ -23,6 +23,8 @@ void kernel_1_16_fuseaware(double* a, double* b, double* c, size_t size) vmovsd xmm1,QWORD PTR [rdx+rax*8] # 1 read vfmadd132sd xmm0,xmm1,QWORD PTR [rsi+rax*8] # 2 FLOPs + 1 read vmovsd QWORD PTR [rdi+rax*8],xmm0 # 1 write + + Uses packed doubles with -Ofast. */ #pragma omp parallel for @@ -36,6 +38,26 @@ void kernel_1_16_fuseaware(double* a, double* b, double* c, size_t size) } } +#define REP0(X) +#define REP1(X) X +#define REP2(X) REP1(X) REP1(X) +#define REP3(X) REP2(X) REP1(X) +#define REP4(X) REP3(X) REP1(X) +#define REP5(X) REP4(X) REP1(X) +#define REP6(X) REP5(X) REP1(X) +#define REP7(X) REP6(X) REP1(X) +#define REP8(X) REP7(X) REP1(X) +#define REP9(X) REP8(X) REP1(X) + +#define REP10(X) REP9(X) REP1(X) +#define REP20(X) REP10(X) REP10(X) +#define REP30(X) REP20(X) REP10(X) +#define REP40(X) REP30(X) REP10(X) +#define REP50(X) REP40(X) REP10(X) +#define REP60(X) REP50(X) REP10(X) + +#define REP100(X) REP50(X) REP50(X) + void kernel_8_1_simple(double* a, double* b, double* c, size_t size) { /* === Warning === @@ -49,7 +71,7 @@ void kernel_8_1_simple(double* a, double* b, double* c, size_t size) vmovsd xmm1,QWORD PTR [rdi] # 1 read vmulsd xmm0,xmm1,xmm1 # 1 FLOP+register shuffling - vmulsd xmm0,xmm0,xmm1 # 15x 1 FLOP+register shuffling + vmulsd xmm0,xmm0,xmm1 # 127x 1 FLOP+register shuffling # [...] vmovsd QWORD PTR [rdi-0x8],xmm0 # 1 write */ @@ -57,16 +79,14 @@ void kernel_8_1_simple(double* a, double* b, double* c, size_t size) #pragma omp parallel for for(size_t i=0; i AI = 8 + COMM: 1 read+1 write = 16 byte + COMP: 128 FLOPs + -> AI = 128/16 = 8 */ - a[i] = a[i] * a[i] * a[i] * - a[i] * a[i] * a[i] * - a[i] * a[i] * a[i] * - a[i] * a[i] * a[i] * - a[i] * a[i] * a[i] * - a[i] * a[i]; + a[i] = REP100(a[i]*) + REP20(a[i]*) + REP8(a[i]*) + REP1(a[i]); } } @@ -76,28 +96,25 @@ void kernel_8_1_fuseaware(double* a, double* b, double* c, size_t size) With FMA (and -O2): vmovsd xmm0,QWORD PTR [rdi] # 1 read - vfmadd132sd xmm0,xmm0,xmm0 # 8x 2 FLOPs+register shuffling + vfmadd132sd xmm0,xmm0,xmm0 # 64 x 2 FLOPs+register shuffling vmovsd QWORD PTR [rdi-0x8],xmm0 # 1 write + + Uses packed doubles with -Ofast. */ #pragma omp parallel for for(size_t i=0; i AI = 8 */ - a[i] = a[i] * a[i] + a[i]; - a[i] = a[i] * a[i] + a[i]; - a[i] = a[i] * a[i] + a[i]; - a[i] = a[i] * a[i] + a[i]; - a[i] = a[i] * a[i] + a[i]; - a[i] = a[i] * a[i] + a[i]; - a[i] = a[i] * a[i] + a[i]; - a[i] = a[i] * a[i] + a[i]; + REP60(a[i] = a[i] * a[i] + a[i];) + REP4(a[i] = a[i] * a[i] + a[i];) } } + /* === FAILED KERNELS === */ /* @@ -127,7 +144,7 @@ void kernel_1_16_simple_dangerous(double* a, double* b, double* c, size_t size) // volatile to prevent compiler from optimizing this away // register to advise compiler to put this in register - volatile register double tmp = 0.1; + double tmp = 0.1; #pragma omp parallel for for(size_t i=0; i