Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • hasdi/mspl-2018-2019
  • elbahrim/mspl-2018-2019
  • hasdit/mspl-2018-2019
  • aydine/mspl-2018-2019
  • firsovol/mspl-2018-2019
  • lemairlo/mspl-2018-2019
  • perrinal/mspl-2018-2019
  • jlassii/mspl-2018-2019
  • fallaly/mspl-2018-2019
  • hannequc/mspl-2018-2019
  • jeffersc/mspl-2018-2019
  • thiachei/mspl-2018-2019
  • meliardq/mspl-2018-2019
  • yaminiz/mspl-2018-2019
  • kabads/mspl-2018-2019
  • waelm/mspl-2018-2019
  • bouazizl/mspl-2018-2019
  • ghammaza/mspl-2018-2019
  • coulibka/mspl-2018-2019
  • housbans/mspl-2018-2019
  • cissed/mspl-2018-2019
  • dothit/mspl-2018-2019
  • bekkouca/mspl-2018-2019
  • khadirm/mspl-2018-2019
  • daumasj/mspl-2018-2019
  • baillsr/mspl-2018-2019
  • sajides/mspl-2018-2019
  • bernesj/mspl-2018-2019
  • madidea/mspl-2018-2019
  • vailhert/mspl-2018-2019
  • lestanir/mspl-2018-2019
  • tetrell/mspl-2018-2019
  • mourthas/mspl-2018-2019
  • alaimoj/mspl-2018-2019
  • martinsb/mspl-2018-2019
  • mathieya/mspl-2018-2019
36 results
Show changes
Commits on Source (6)
DAUMAS Jolan
BLAYE Thibault
\ No newline at end of file
Activities/TD2/poisson.jpg

168 KiB

"x"
6.05643767072
3.15295274290587
4.54469261700601
4.94929390744156
4.6064024847115
3.84081322586277
6.26728299615841
3.85801144238035
7.02763557081556
3.90592885142137
5.95730448133523
7.42996808905166
1.91670894833149
3.58181684977394
3.80001799540951
4.95392559710511
3.57362061787589
0.0153168686428362
0.339299607136721
5.98017001859529
3.54004210888229
1.32803734903
3.74212396636057
5.8220120487589
6.84279019189745
3.3542963025907
3.61409592584661
1.35525537220783
4.69014603224691
3.04000768605982
4.68317518486183
5.05725600584323
5.55265528295488
3.08661043688918
4.75743268494696
1.42448698138999
2.82331148743076
2.72363860873522
0.378688525080052
4.05418391033838
4.30899790030038
3.458414052177
5.13724485354928
2.90994275938514
1.94757843337106
4.64922703883308
2.78291023571999
6.16615189258188
3.35283069607998
4.98347182510331
---
title: "Measurement and Analysis of Ping-Pong Time"
author: "Jean-Marc Vincent, Lucas Mello Schnorr"
date: "January 2017"
output:
pdf_document:
number_sections: yes
html_document: default
geometry: margin=1.5in, top=0.5in, bottom=0.5in
subtitle: A RStudio Demonstration and Statistical Analysis Draft
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
This document is used to do a demonstration of Rstudio. It has an introduction, followed by a step-by-step receipt to conduct an analysis following the principles of literate programming.
# Introduction
An experimental protocol is the description of the experience that has been conducted. It explains how the measurements have been implemented. For this Rstudio demonstration, we have measured the ping-pong time. It corresponds to the time taken by a message of a given size to go from one machine to another and back to the original machine. The following C-based code represents how the ping-pong time is measured, for a given `size` message in an environment with two processes (the `sender` and the `receiver`). We can see that the measured time is the one observed by the sender.
```{c eval=FALSE}
if (sender) {
atime = get_time();
send_message(size);
btime = get_time();
recv_message(size);
ctime = get_time();
//Print (btime - atime) + (ctime - btime)
}else{
recv_message(size);
send_message(size);
}
```
Network measurements can be affected by a number of external factors. The most easily observed ones are the operating system complexity, its network layer, and also the wired or wireless hardware. All these factors increase the measurement variability, which should be quantified and taken into account during the analysis. As a consequence, the experiment has been replicated many times (by injecting the code above in a for loop).
The computer program generates a textual file in the CSV (Comma-Separated Values) format. Each line of this CSV file contains one measurement, as the output of the Print command above.
# Synthetic Ping-Pong Measurements (Demonstration)
We use synthetic measurements created as detailed below for this demonstration. For such reason, skip this section is you are playing with the given real measurements.
## Create data
To create the synthetic ping-pong measurements, the code below is executed to generate 50 measurements considering an average of 4 and standard deviation of 1.5. The resulting values are kept in the `times` vector, which is written as a CSV file with the name `pp-synthetic.csv`.
```{r}
set.seed(42);
times <- rnorm(n = 50, mean = 4, sd = 1.5);
write.csv(times, "pp-synthetic.csv", row.names = FALSE);
times
```
The `set.seed` call is here only to keep the synthetic analysis fully reproducible.
## Read data
To read the synthetic ping-pong measurements, the code below reads the file `pp-synthetic.csv` and creates the data frame `df`. This data frame has a single column called `time` which contains all measurements that have been previously generated. The call to `head(df)` lists the first rows of the data frame.
```{r}
df <- read.csv("pp-synthetic.csv", header=TRUE, col.names=c("time"))
head(df);
```
## Overview of all the measurements
The best way to get an overview is to plot the data (see below). Unaware of the true mean, define the following metrics, providing a discussion:
- Estimate the ping-pong measurement (mean, median)
- Quantify the variability (sd)
```{r, fig.width=6, fig.height=3.5}
plot(df$time, ylab="Time (seconds)", xlab="Measurement Number");
```
## Statistical Summary
A summary of a set of values can be easily obtained in R with (`df$time` is a vector with the measurements -- it represents the column `time` of the data frame `df`). So, calling the function `summary` on this vector gives:
```{r}
summary(df$time);
```
We can see that the mean estimation is `3.946`, while the median gives `3.849`. Other values are the minimum, the first quartile, the third quartile and the maximum number.
## Boxplot
A much nicer way to represent the metrics of the summary above is using a boxplot. It takes all the values and create a visual representation. Can you explain what represents each element of the boxplot? How it helps you to better understand the data?
```{r, fig.width=2, fig.height=3.5}
boxplot(df$time, ylab="Time (seconds)")
```
## Histogram
A histogram counts how many times the observed value falls into different bins. For time measurements, a bin is represented by an interval of values, for instance (2,3). Whenever the measurement falls into a bin, a counter is incremented. Below we can see the histogram for the ping-pong synthetic measurement. We have used 7 bins (as explicited by the `breaks` parameter).
```{r, fig.width=6}
hist(df$time, breaks=10, xlab="Time (seconds)", main="Histogram of Ping-Pong")
```
The problem of histograms is that by changing the number of bins, the perception based on the plot can radically change. Let's for instance force the number of bins to 2:
```{r, fig.width=6}
hist(df$time, breaks=2, xlab="Time (seconds)", main="Histogram of Ping-Pong")
```
We can see that two bins hardly represents the reality.
## Variability
As we previously mentioned, ping-pong measurements suffer external influences that might affect estimations. As a consequence, we need to evaluate the variability. A common measure of variability is the standard deviation, calculated like this:
```{r}
sd(df$time)
```
## Wrap up
We have synthetically generated measurements for ping-pong, by using the `rnorm` function of R. We asked for a true mean of 4, with a standard deviation of 1.5. so, unaware of such values (using only statistics), we can estimate the mean and standard deviation as:
```{r}
mean(df$time)
sd(df$time)
```
Which are slightly different from the true values.
# Real Ping-Pong Measurements (TD)
The measurements detailed in the introduction have been obtained for a number of different message sizes. We provide five of them here, with the following names:
```{bash}
ls data/PP_size*.csv
```
From 1 to 5 bytes. You should repeat all the steps of previous section using at least one of these files. Write your stastitical interpretation of the measurements. Here's how you can read one of them.
```{r}
df <- read.csv("data/PP_size_1.csv");
head(df);
```
# Iteration Duration of a Geophysics Parallel Application
Geophysics parallel applications are generally organized in timesteps. At each timestep, the mathematical model is calculated taking into account the time interval. As the simulation evolves, one can measure how much time it takes to calculate the mathematical model for each iteration. Here we provide a data set that contains the duration of each iteration in a simulation composed of a little less than 500 iterations. Here's the code to read that data set:
```{r}
df <- read.csv("data/IterDuration.csv")
head(df);
```
```{r, fig.width=6, fig.height=3.5}
plot(df$duration, ylab="Time (seconds)", xlab="Measurement Number")
```
---
title: "Measurement and Analysis of Ping-Pong Time"
author: "Jean-Marc Vincent, Lucas Mello Schnorr"
date: "January 2017"
output:
html_document: default
pdf_document:
number_sections: yes
geometry: margin=1.5in, top=0.5in, bottom=0.5in
subtitle: A RStudio Demonstration and Statistical Analysis Draft
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
This document is used to do a demonstration of Rstudio. It has an introduction, followed by a step-by-step receipt to conduct an analysis following the principles of literate programming.
# Introduction
An experimental protocol is the description of the experience that has been conducted. It explains how the measurements have been implemented. For this Rstudio demonstration, we have measured the ping-pong time. It corresponds to the time taken by a message of a given size to go from one machine to another and back to the original machine. The following C-based code represents how the ping-pong time is measured, for a given `size` message in an environment with two processes (the `sender` and the `receiver`). We can see that the measured time is the one observed by the sender.
```{c eval=FALSE}
if (sender) {
atime = get_time();
send_message(size);
btime = get_time();
recv_message(size);
ctime = get_time();
//Print (btime - atime) + (ctime - btime)
}else{
recv_message(size);
send_message(size);
}
```
Network measurements can be affected by a number of external factors. The most easily observed ones are the operating system complexity, its network layer, and also the wired or wireless hardware. All these factors increase the measurement variability, which should be quantified and taken into account during the analysis. As a consequence, the experiment has been replicated many times (by injecting the code above in a for loop).
The computer program generates a textual file in the CSV (Comma-Separated Values) format. Each line of this CSV file contains one measurement, as the output of the Print command above.
# Synthetic Ping-Pong Measurements (Demonstration)
We use synthetic measurements created as detailed below for this demonstration. For such reason, skip this section is you are playing with the given real measurements.
## Create data
To create the synthetic ping-pong measurements, the code below is executed to generate 50 measurements considering an average of 4 and standard deviation of 1.5. The resulting values are kept in the `times` vector, which is written as a CSV file with the name `pp-synthetic.csv`.
```{r}
set.seed(42);
times <- rnorm(n = 50, mean = 4, sd = 1.5);
write.csv(times, "pp-synthetic.csv", row.names = FALSE);
times
```
The `set.seed` call is here only to keep the synthetic analysis fully reproducible.
## Read data
To read the synthetic ping-pong measurements, the code below reads the file `pp-synthetic.csv` and creates the data frame `df`. This data frame has a single column called `time` which contains all measurements that have been previously generated. The call to `head(df)` lists the first rows of the data frame.
```{r}
df <- read.csv("pp-synthetic.csv", header=TRUE, col.names=c("time"))
head(df);
```
## Overview of all the measurements
The best way to get an overview is to plot the data (see below). Unaware of the true mean, define the following metrics, providing a discussion:
- Estimate the ping-pong measurement (mean, median)
- Quantify the variability (sd)
```{r, fig.width=6, fig.height=3.5}
plot(df$time, ylab="Time (seconds)", xlab="Measurement Number");
```
## Statistical Summary
A summary of a set of values can be easily obtained in R with (`df$time` is a vector with the measurements -- it represents the column `time` of the data frame `df`). So, calling the function `summary` on this vector gives:
```{r}
summary(df$time);
```
We can see that the mean estimation is `3.946`, while the median gives `3.849`. Other values are the minimum, the first quartile, the third quartile and the maximum number.
## Boxplot
A much nicer way to represent the metrics of the summary above is using a boxplot. It takes all the values and create a visual representation. Can you explain what represents each element of the boxplot? How it helps you to better understand the data?
```{r, fig.width=2, fig.height=3.5}
boxplot(df$time, ylab="Time (seconds)")
```
## Histogram
A histogram counts how many times the observed value falls into different bins. For time measurements, a bin is represented by an interval of values, for instance (2,3). Whenever the measurement falls into a bin, a counter is incremented. Below we can see the histogram for the ping-pong synthetic measurement. We have used 7 bins (as explicited by the `breaks` parameter).
```{r, fig.width=6}
hist(df$time, breaks=10, xlab="Time (seconds)", main="Histogram of Ping-Pong")
```
The problem of histograms is that by changing the number of bins, the perception based on the plot can radically change. Let's for instance force the number of bins to 2:
```{r, fig.width=6}
hist(df$time, breaks=2, xlab="Time (seconds)", main="Histogram of Ping-Pong")
```
We can see that two bins hardly represents the reality.
## Variability
As we previously mentioned, ping-pong measurements suffer external influences that might affect estimations. As a consequence, we need to evaluate the variability. A common measure of variability is the standard deviation, calculated like this:
```{r}
sd(df$time)
```
## Wrap up
We have synthetically generated measurements for ping-pong, by using the `rnorm` function of R. We asked for a true mean of 4, with a standard deviation of 1.5. so, unaware of such values (using only statistics), we can estimate the mean and standard deviation as:
```{r}
mean(df$time)
sd(df$time)
```
Which are slightly different from the true values.
# Real Ping-Pong Measurements (TD)
The measurements detailed in the introduction have been obtained for a number of different message sizes. We provide five of them here, with the following names:
```{bash}
ls data/PP_size*.csv
```
From 1 to 5 bytes. You should repeat all the steps of previous section using at least one of these files. Write your stastitical interpretation of the measurements. Here's how you can read one of them.
```{r}
df <- read.csv("data/PP_all.csv");
head(df);
```
# Iteration Duration of a Geophysics Parallel Application
Geophysics parallel applications are generally organized in timesteps. At each timestep, the mathematical model is calculated taking into account the time interval. As the simulation evolves, one can measure how much time it takes to calculate the mathematical model for each iteration. Here we provide a data set that contains the duration of each iteration in a simulation composed of a little less than 500 iterations. Here's the code to read that data set:
```{r}
df <- read.csv("data/IterDuration.csv")
head(df);
```
```{r, fig.width=6, fig.height=3.5}
plot(df$duration, ylab="Time (seconds)", xlab="Measurement Number")
```
```{r}
df1 <- read.csv("data/PP_size_1.csv")
df2 <- read.csv("data/PP_size_2.csv")
df3 <- read.csv("data/PP_size_3.csv")
df4 <- read.csv("data/PP_size_4.csv")
df5 <- read.csv("data/PP_size_5.csv")
PP_all <- rbind(df1,df2,df3,df4,df5)
boxplot(PP_all)
```
This diff is collapsed.
\documentclass[]{article}
\usepackage{lmodern}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
\usepackage{fixltx2e} % provides \textsubscript
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\else % if luatex or xelatex
\ifxetex
\usepackage{mathspec}
\else
\usepackage{fontspec}
\fi
\defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
\fi
% use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
% use microtype if available
\IfFileExists{microtype.sty}{%
\usepackage{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\usepackage[margin=1.5in, top=0.5in, bottom=0.5in]{geometry}
\usepackage{hyperref}
\hypersetup{unicode=true,
pdftitle={Measurement and Analysis of Ping-Pong Time},
pdfauthor={Jean-Marc Vincent, Lucas Mello Schnorr},
pdfborder={0 0 0},
breaklinks=true}
\urlstyle{same} % don't use monospace font for urls
\usepackage{color}
\usepackage{fancyvrb}
\newcommand{\VerbBar}{|}
\newcommand{\VERB}{\Verb[commandchars=\\\{\}]}
\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}
% Add ',fontsize=\small' for more characters per line
\usepackage{framed}
\definecolor{shadecolor}{RGB}{248,248,248}
\newenvironment{Shaded}{\begin{snugshade}}{\end{snugshade}}
\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{\textbf{#1}}}
\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{#1}}
\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}}
\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}}
\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}}
\newcommand{\ConstantTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\CharTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\SpecialCharTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\StringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\VerbatimStringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\SpecialStringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
\newcommand{\ImportTok}[1]{#1}
\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textit{#1}}}
\newcommand{\DocumentationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\AnnotationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\CommentVarTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{#1}}
\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\VariableTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
\newcommand{\ControlFlowTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{\textbf{#1}}}
\newcommand{\OperatorTok}[1]{\textcolor[rgb]{0.81,0.36,0.00}{\textbf{#1}}}
\newcommand{\BuiltInTok}[1]{#1}
\newcommand{\ExtensionTok}[1]{#1}
\newcommand{\PreprocessorTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textit{#1}}}
\newcommand{\AttributeTok}[1]{\textcolor[rgb]{0.77,0.63,0.00}{#1}}
\newcommand{\RegionMarkerTok}[1]{#1}
\newcommand{\InformationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\WarningTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
\newcommand{\AlertTok}[1]{\textcolor[rgb]{0.94,0.16,0.16}{#1}}
\newcommand{\ErrorTok}[1]{\textcolor[rgb]{0.64,0.00,0.00}{\textbf{#1}}}
\newcommand{\NormalTok}[1]{#1}
\usepackage{graphicx,grffile}
\makeatletter
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
\makeatother
% Scale images if necessary, so that they will not overflow the page
% margins by default, and it is still possible to overwrite the defaults
% using explicit options in \includegraphics[width, height, ...]{}
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\setcounter{secnumdepth}{5}
% Redefines (sub)paragraphs to behave more like sections
\ifx\paragraph\undefined\else
\let\oldparagraph\paragraph
\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
\fi
\ifx\subparagraph\undefined\else
\let\oldsubparagraph\subparagraph
\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
\fi
%%% Use protect on footnotes to avoid problems with footnotes in titles
\let\rmarkdownfootnote\footnote%
\def\footnote{\protect\rmarkdownfootnote}
%%% Change title format to be more compact
\usepackage{titling}
% Create subtitle command for use in maketitle
\newcommand{\subtitle}[1]{
\posttitle{
\begin{center}\large#1\end{center}
}
}
\setlength{\droptitle}{-2em}
\title{Measurement and Analysis of Ping-Pong Time}
\pretitle{\vspace{\droptitle}\centering\huge}
\posttitle{\par}
\subtitle{A RStudio Demonstration and Statistical Analysis Draft}
\author{Jean-Marc Vincent, Lucas Mello Schnorr}
\preauthor{\centering\large\emph}
\postauthor{\par}
\predate{\centering\large\emph}
\postdate{\par}
\date{January 2017}
\begin{document}
\maketitle
This document is used to do a demonstration of Rstudio. It has an
introduction, followed by a step-by-step receipt to conduct an analysis
following the principles of literate programming.
\section{Introduction}\label{introduction}
An experimental protocol is the description of the experience that has
been conducted. It explains how the measurements have been implemented.
For this Rstudio demonstration, we have measured the ping-pong time. It
corresponds to the time taken by a message of a given size to go from
one machine to another and back to the original machine. The following
C-based code represents how the ping-pong time is measured, for a given
\texttt{size} message in an environment with two processes (the
\texttt{sender} and the \texttt{receiver}). We can see that the measured
time is the one observed by the sender.
\begin{Shaded}
\begin{Highlighting}[]
\ControlFlowTok{if}\NormalTok{ (sender) \{}
\NormalTok{ atime = get_time();}
\NormalTok{ send_message(size);}
\NormalTok{ btime = get_time();}
\NormalTok{ recv_message(size);}
\NormalTok{ ctime = get_time();}
\CommentTok{//Print (btime - atime) + (ctime - btime)}
\NormalTok{\}}\ControlFlowTok{else}\NormalTok{\{}
\NormalTok{ recv_message(size);}
\NormalTok{ send_message(size);}
\NormalTok{\}}
\end{Highlighting}
\end{Shaded}
Network measurements can be affected by a number of external factors.
The most easily observed ones are the operating system complexity, its
network layer, and also the wired or wireless hardware. All these
factors increase the measurement variability, which should be quantified
and taken into account during the analysis. As a consequence, the
experiment has been replicated many times (by injecting the code above
in a for loop). The computer program generates a textual file in the CSV
(Comma-Separated Values) format. Each line of this CSV file contains one
measurement, as the output of the Print command above.
\section{Synthetic Ping-Pong Measurements
(Demonstration)}\label{synthetic-ping-pong-measurements-demonstration}
We use synthetic measurements created as detailed below for this
demonstration. For such reason, skip this section is you are playing
with the given real measurements.
\subsection{Create data}\label{create-data}
To create the synthetic ping-pong measurements, the code below is
executed to generate 50 measurements considering an average of 4 and
standard deviation of 1.5. The resulting values are kept in the
\texttt{times} vector, which is written as a CSV file with the name
\texttt{pp-synthetic.csv}.
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{set.seed}\NormalTok{(}\DecValTok{42}\NormalTok{);}
\NormalTok{times <-}\StringTok{ }\KeywordTok{rnorm}\NormalTok{(}\DataTypeTok{n =} \DecValTok{50}\NormalTok{, }\DataTypeTok{mean =} \DecValTok{4}\NormalTok{, }\DataTypeTok{sd =} \FloatTok{1.5}\NormalTok{);}
\KeywordTok{write.csv}\NormalTok{(times, }\StringTok{"pp-synthetic.csv"}\NormalTok{, }\DataTypeTok{row.names =} \OtherTok{FALSE}\NormalTok{);}
\NormalTok{times}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 6.05643767 3.15295274 4.54469262 4.94929391 4.60640248 3.84081323
## [7] 6.26728300 3.85801144 7.02763557 3.90592885 5.95730448 7.42996809
## [13] 1.91670895 3.58181685 3.80001800 4.95392560 3.57362062 0.01531687
## [19] 0.33929961 5.98017002 3.54004211 1.32803735 3.74212397 5.82201205
## [25] 6.84279019 3.35429630 3.61409593 1.35525537 4.69014603 3.04000769
## [31] 4.68317518 5.05725601 5.55265528 3.08661044 4.75743268 1.42448698
## [37] 2.82331149 2.72363861 0.37868853 4.05418391 4.30899790 3.45841405
## [43] 5.13724485 2.90994276 1.94757843 4.64922704 2.78291024 6.16615189
## [49] 3.35283070 4.98347183
\end{verbatim}
The \texttt{set.seed} call is here only to keep the synthetic analysis
fully reproducible.
\subsection{Read data}\label{read-data}
To read the synthetic ping-pong measurements, the code below reads the
file \texttt{pp-synthetic.csv} and creates the data frame \texttt{df}.
This data frame has a single column called \texttt{time} which contains
all measurements that have been previously generated. The call to
\texttt{head(df)} lists the first rows of the data frame.
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{df <-}\StringTok{ }\KeywordTok{read.csv}\NormalTok{(}\StringTok{"pp-synthetic.csv"}\NormalTok{, }\DataTypeTok{header=}\OtherTok{TRUE}\NormalTok{, }\DataTypeTok{col.names=}\KeywordTok{c}\NormalTok{(}\StringTok{"time"}\NormalTok{))}
\KeywordTok{head}\NormalTok{(df);}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## time
## 1 6.056438
## 2 3.152953
## 3 4.544693
## 4 4.949294
## 5 4.606402
## 6 3.840813
\end{verbatim}
\subsection{Overview of all the
measurements}\label{overview-of-all-the-measurements}
The best way to get an overview is to plot the data (see below). Unaware
of the true mean, define the following metrics, providing a discussion:
\begin{itemize}
\tightlist
\item
Estimate the ping-pong measurement (mean, median)
\item
Quantify the variability (sd)
\end{itemize}
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{plot}\NormalTok{(df}\OperatorTok{$}\NormalTok{time, }\DataTypeTok{ylab=}\StringTok{"Time (seconds)"}\NormalTok{, }\DataTypeTok{xlab=}\StringTok{"Measurement Number"}\NormalTok{);}
\end{Highlighting}
\end{Shaded}
\includegraphics{TD3_files/figure-latex/unnamed-chunk-4-1.pdf}
\subsection{Statistical Summary}\label{statistical-summary}
A summary of a set of values can be easily obtained in R with
(\texttt{df\$time} is a vector with the measurements -- it represents
the column \texttt{time} of the data frame \texttt{df}). So, calling the
function \texttt{summary} on this vector gives:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{summary}\NormalTok{(df}\OperatorTok{$}\NormalTok{time);}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01532 3.05166 3.84941 3.94649 4.97609 7.42997
\end{verbatim}
We can see that the mean estimation is \texttt{3.946}, while the median
gives \texttt{3.849}. Other values are the minimum, the first quartile,
the third quartile and the maximum number.
\subsection{Boxplot}\label{boxplot}
A much nicer way to represent the metrics of the summary above is using
a boxplot. It takes all the values and create a visual representation.
Can you explain what represents each element of the boxplot? How it
helps you to better understand the data?
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{boxplot}\NormalTok{(df}\OperatorTok{$}\NormalTok{time, }\DataTypeTok{ylab=}\StringTok{"Time (seconds)"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\includegraphics{TD3_files/figure-latex/unnamed-chunk-6-1.pdf}
\subsection{Histogram}\label{histogram}
A histogram counts how many times the observed value falls into
different bins. For time measurements, a bin is represented by an
interval of values, for instance (2,3). Whenever the measurement falls
into a bin, a counter is incremented. Below we can see the histogram for
the ping-pong synthetic measurement. We have used 7 bins (as explicited
by the \texttt{breaks} parameter).
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{hist}\NormalTok{(df}\OperatorTok{$}\NormalTok{time, }\DataTypeTok{breaks=}\DecValTok{10}\NormalTok{, }\DataTypeTok{xlab=}\StringTok{"Time (seconds)"}\NormalTok{, }\DataTypeTok{main=}\StringTok{"Histogram of Ping-Pong"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\includegraphics{TD3_files/figure-latex/unnamed-chunk-7-1.pdf} The
problem of histograms is that by changing the number of bins, the
perception based on the plot can radically change. Let's for instance
force the number of bins to 2:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{hist}\NormalTok{(df}\OperatorTok{$}\NormalTok{time, }\DataTypeTok{breaks=}\DecValTok{2}\NormalTok{, }\DataTypeTok{xlab=}\StringTok{"Time (seconds)"}\NormalTok{, }\DataTypeTok{main=}\StringTok{"Histogram of Ping-Pong"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\includegraphics{TD3_files/figure-latex/unnamed-chunk-8-1.pdf} We can
see that two bins hardly represents the reality.
\subsection{Variability}\label{variability}
As we previously mentioned, ping-pong measurements suffer external
influences that might affect estimations. As a consequence, we need to
evaluate the variability. A common measure of variability is the
standard deviation, calculated like this:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{sd}\NormalTok{(df}\OperatorTok{$}\NormalTok{time)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 1.727215
\end{verbatim}
\subsection{Wrap up}\label{wrap-up}
We have synthetically generated measurements for ping-pong, by using the
\texttt{rnorm} function of R. We asked for a true mean of 4, with a
standard deviation of 1.5. so, unaware of such values (using only
statistics), we can estimate the mean and standard deviation as:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{mean}\NormalTok{(df}\OperatorTok{$}\NormalTok{time)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 3.946492
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{sd}\NormalTok{(df}\OperatorTok{$}\NormalTok{time)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 1.727215
\end{verbatim}
Which are slightly different from the true values.
\section{Real Ping-Pong Measurements
(TD)}\label{real-ping-pong-measurements-td}
The measurements detailed in the introduction have been obtained for a
number of different message sizes. We provide five of them here, with
the following names:
\begin{Shaded}
\begin{Highlighting}[]
\FunctionTok{ls}\NormalTok{ data/PP_size*.csv}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## data/PP_size_1.csv
## data/PP_size_2.csv
## data/PP_size_3.csv
## data/PP_size_4.csv
## data/PP_size_5.csv
\end{verbatim}
From 1 to 5 bytes. You should repeat all the steps of previous section
using at least one of these files. Write your stastitical interpretation
of the measurements. Here's how you can read one of them.
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{df <-}\StringTok{ }\KeywordTok{read.csv}\NormalTok{(}\StringTok{"data/PP_size_1.csv"}\NormalTok{);}
\KeywordTok{head}\NormalTok{(df);}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## time
## 1 0.000133434
## 2 0.000099994
## 3 0.000052050
## 4 0.000038092
## 5 0.000030759
## 6 0.000030473
\end{verbatim}
\section{Iteration Duration of a Geophysics Parallel
Application}\label{iteration-duration-of-a-geophysics-parallel-application}
Geophysics parallel applications are generally organized in timesteps.
At each timestep, the mathematical model is calculated taking into
account the time interval. As the simulation evolves, one can measure
how much time it takes to calculate the mathematical model for each
iteration. Here we provide a data set that contains the duration of each
iteration in a simulation composed of a little less than 500 iterations.
Here's the code to read that data set:
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{df <-}\StringTok{ }\KeywordTok{read.csv}\NormalTok{(}\StringTok{"data/IterDuration.csv"}\NormalTok{)}
\KeywordTok{head}\NormalTok{(df);}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## duration
## 1 3.539277
## 2 3.539381
## 3 3.539294
## 4 3.539549
## 5 3.550483
## 6 3.586875
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{plot}\NormalTok{(df}\OperatorTok{$}\NormalTok{duration, }\DataTypeTok{ylab=}\StringTok{"Time (seconds)"}\NormalTok{, }\DataTypeTok{xlab=}\StringTok{"Measurement Number"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\includegraphics{TD3_files/figure-latex/unnamed-chunk-14-1.pdf}
\end{document}
"x"
6.05643767072
3.15295274290587
4.54469261700601
4.94929390744156
4.6064024847115
3.84081322586277
6.26728299615841
3.85801144238035
7.02763557081556
3.90592885142137
5.95730448133523
7.42996808905166
1.91670894833149
3.58181684977394
3.80001799540951
4.95392559710511
3.57362061787589
0.0153168686428362
0.339299607136721
5.98017001859529
3.54004210888229
1.32803734903
3.74212396636057
5.8220120487589
6.84279019189745
3.3542963025907
3.61409592584661
1.35525537220783
4.69014603224691
3.04000768605982
4.68317518486183
5.05725600584323
5.55265528295488
3.08661043688918
4.75743268494696
1.42448698138999
2.82331148743076
2.72363860873522
0.378688525080052
4.05418391033838
4.30899790030038
3.458414052177
5.13724485354928
2.90994275938514
1.94757843337106
4.64922703883308
2.78291023571999
6.16615189258188
3.35283069607998
4.98347182510331
"x"
6.05643767072
3.15295274290587
4.54469261700601
4.94929390744156
4.6064024847115
3.84081322586277
6.26728299615841
3.85801144238035
7.02763557081556
3.90592885142137
5.95730448133523
7.42996808905166
1.91670894833149
3.58181684977394
3.80001799540951
4.95392559710511
3.57362061787589
0.0153168686428362
0.339299607136721
5.98017001859529
3.54004210888229
1.32803734903
3.74212396636057
5.8220120487589
6.84279019189745
3.3542963025907
3.61409592584661
1.35525537220783
4.69014603224691
3.04000768605982
4.68317518486183
5.05725600584323
5.55265528295488
3.08661043688918
4.75743268494696
1.42448698138999
2.82331148743076
2.72363860873522
0.378688525080052
4.05418391033838
4.30899790030038
3.458414052177
5.13724485354928
2.90994275938514
1.94757843337106
4.64922703883308
2.78291023571999
6.16615189258188
3.35283069607998
4.98347182510331
"x"
6.05643767072
3.15295274290587
4.54469261700601
4.94929390744156
4.6064024847115
3.84081322586277
6.26728299615841
3.85801144238035
7.02763557081556
3.90592885142137
5.95730448133523
7.42996808905166
1.91670894833149
3.58181684977394
3.80001799540951
4.95392559710511
3.57362061787589
0.0153168686428362
0.339299607136721
5.98017001859529
3.54004210888229
1.32803734903
3.74212396636057
5.8220120487589
6.84279019189745
3.3542963025907
3.61409592584661
1.35525537220783
4.69014603224691
3.04000768605982
4.68317518486183
5.05725600584323
5.55265528295488
3.08661043688918
4.75743268494696
1.42448698138999
2.82331148743076
2.72363860873522
0.378688525080052
4.05418391033838
4.30899790030038
3.458414052177
5.13724485354928
2.90994275938514
1.94757843337106
4.64922703883308
2.78291023571999
6.16615189258188
3.35283069607998
4.98347182510331