Beschreibung der durchgeführten Benchmarks, plots und analyse fehlt noch

This commit is contained in:
Johannes 2016-06-24 21:42:13 +02:00
parent 8a40fbf7d6
commit 3fc9e93a03
7 changed files with 45 additions and 33 deletions

View file

@ -1,10 +1,3 @@
/*
* bin_reduce.c
*
* Created on: 16 Jun 2016
* Author: johannes
*/
#include <stdio.h>
#include <string.h>
#include <mpi.h>

View file

@ -1,10 +1,3 @@
/*
* binom_reduce.c
*
* Created on: 18 Jun 2016
* Author: johannes
*/
#include <mpi.h>
#include <string.h>
#include <stdio.h>

View file

@ -1,10 +1,3 @@
/*
* fib_reduce.c
*
* Created on: 18 Jun 2016
* Author: johannes
*/
#include <stdio.h>
#include <string.h>
#include <mpi.h>

View file

@ -1,18 +1,10 @@
/*
============================================================================
Name : hpc_mpi.c
Author :
Version :
Copyright : Your copyright notice
Description : Hello MPI World in C
============================================================================
*/
#include <stdio.h>
#include <string.h>
#include <mpi.h>
#include <unistd.h>
#include <stdlib.h>
#include <time.h>
#include <getopt.h>
#include "bin_reduce.h"
#include "binom_reduce.h"
#include "fib_reduce.h"

BIN
reduce/report/nodeplot.pdf Executable file

Binary file not shown.

BIN
reduce/report/report.pdf Executable file

Binary file not shown.

View file

@ -106,7 +106,7 @@ As a baseline for the comparison serves a given implementation of the MPI standa
A binomial tree has a non-fixed degree where each tree $B_i$ has exactly $i$ subtrees of size $B_0$ to $B_{i-1}$.
The number of nodes in such a tree is equal to $2^i$ and the depth is $i$.
\item[Fibonacci Tree]
The Fibonacci tree uses a fixed degree of $2$ where a tree of size $F_i$ has one subtree of size $T_{i-1}$ and one of $T_{i-2}$.
The Fibonacci tree uses a fixed degree of $2$ where a tree of size $F_i$ has one subtree of size $F_{i-1}$ and one of $F_{i-2}$.
Therefore the number of nodes in this kind of tree is $fib(i+3)-1$ using the Fibonacci function $fib(x) = fib(x-1)+fib(x-2)$ and its depth is as well $i$.
\item[Binary Tree]
The binary tree used for reduction is a common complete binary tree where a tree $T_i$ has two subtrees $T_{i-1}$.
@ -285,7 +285,11 @@ There is again a comparison between a tree $T_2$ and $T_3$ which is shown in \pr
\begin{figure}
\begin{center}
\begin{tikzpicture}
\begin{tikzpicture}[
auto,
level 1/.style={sibling distance=40mm},
level 2/.style={sibling distance=20mm},
level 3/.style={sibling distance=10mm}]
\begin{scope}
\node [circle,draw]{$0$}
child { node [circle,draw]{$1$}
@ -295,7 +299,7 @@ child { node [circle,draw]{$4$}
child { node [circle,draw]{$5$}}
child { node [circle,draw]{$6$}}};
\end{scope}
\begin{scope}[shift={(5,0)}]
\begin{scope}[shift={(7,0)}]
\node [circle,draw]{$0$}
child { node [circle,draw]{$1$}
child { node [circle,draw]{$2$}
@ -328,6 +332,43 @@ As a result the number of rounds for this algorithm is the size of the tree plus
\section{Results}
\label{sec:results}
Before we compared the runtime of our algorithms the correctness has to be tested by a reasonable amount.
As already stated in the project description this process can be done easily by comparing each result to the MPI\_Reduce function.
This can be done for all implementations at once to quickly test them for correctness.
With this method we tried various combinations of array sizes, process numbers as well as different operations and the result always turned out to be correct.
After doing those tests to show the correctness we started doing some benchmarking comparisons of our implementations.
To do this we utilized 36 nodes of the jupiter system with 16 processes each.
We did two kinds of tests which will be explained now to compare the runtime of our implementations as well as the MPI\_Reduce function.
\subsection{Process Count}
For the first benchmark we used a different number of processes to check the scaling of all methods.
The size of the array on each process for this test is $1000$.
This is a rather low value, but since we are using tree reduction which is not optimal for high amounts of data such a value makes sense.
The amount of processes used was increased from starting with only one node up to using all available 36 nodes.
On each node all 16 processes where used in all tests.
Therefore the total process count ranged from 16 to 576.
For all out tests in this project we used a repetition count of 30 which allowed us to run a high number of different inputs in a reasonable amount of time.
\begin{figure}
\begin{adjustbox}{center}
\includegraphics[width=0.8\linewidth]{nodeplot}
\end{adjustbox}
\caption{Average runtimes on 1 to 36 nodes with 16 processes each.}
\label{fig:roofline}
\end{figure}
\subsection{Array Size}
Our second used a fixed number of processes but the size of the local arrays was increasing.
This should show how the different implementations perform for small arrays or even a single number.
But it also shows how they perform with a large amount of data on each process.
The amount of nodes was fixed at 36 for the complete test.
The size of the local arrays was increased by a factor of 10 in each iteration starting with just 1 and increased it up to 1000000.
The number of repetitions is the same as for the last test at 30.
\FloatBarrier
\section{Analysis}