Commit 29346049 authored by Laurence Viry's avatar Laurence Viry
Browse files

testOne_Bio.md

parent f3225e8a
We focus on one sample tests.
# Tests on a Gaussian sample
......@@ -7,6 +8,7 @@ Consider a random sample $(X_1, \ldots,X_n)$ of a distribution with mean $\mu$,
* The empirical mean is $\bar X = \frac{X_1 + \ldots + X_n}n$
* The empirical variance is $S^2 = \frac{n}{n-1} \left(\frac{X^2_1 +\ldots + X^2_n}n - \bar X^2\right)$.
## Test of the mean
A first test is on the mean of the sample.
......@@ -14,11 +16,13 @@ A first test is on the mean of the sample.
**Example**. For an adult, the logarithm of the D-dimer concentration, denoted by $X$, is modeled by a normal random variable with mean $\mu$ and standard deviation $\sigma$. The variable $X$ is an indicator for the risk of thrombosis: it is considered that for healthy individuals, $\mu$ is −1, whereas for individuals at risk $\mu$ is 0.
The influence of olive oil on thrombosis risk must be evaluated.
A group of 13 patients, previously considered as being at risk, had an olive oil enriched diet. After the diet, their value of $X$ was measured, and this gave an empirical mean of −0.15.
<\div>
The doctor would like to decide if the olive oil diet has improved the D-dimer concentration.
<!--dyndoc {#msg][#md>] -->
<div style="background-color:yellow;:Orange;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
{#msg][#md>]
The **test on the mean** of the sample compares the hypothesis $H_0: \mu=\mu_0$
with a two-sided hypothesis $H1: \mu\neq \mu_0$ or a one-sided hypothesis $H1: \mu\geq \mu_0$ or $H1: \mu\leq \mu_0$.
......@@ -26,38 +30,57 @@ When the **variance $\sigma^2$ is known**, the test statistic is
$$T = \sqrt{n} \left(\frac{\bar X - \mu_0}{\sigma}\right)$$
When $H_0$ is true, the statistic $T$ follows a $\mathcal{N}(0,1)$.
[#msg}
</div>
<!--dyndoc [#msg} -->
The decision rule depends on $H_1$. It consists in computing the bounds at which we reject $H_0$. The bounds depend also on the risk of the test (the first kind risk).
The decision rule depends on $H_1$. It consists in computing the bounds at which we reject
$H_0$. The bounds depend also on the risk of the test (the first kind risk).
{#msg][#md>]
<!--dyndoc {#msg][#md>] -->
<div style="background-color:yellow;:Orange;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
The **test on the mean** of the sample compares the hypothesis $H_0: \mu=\mu_0$ with one of the three alternatives:
* For $H1: \mu\neq \mu_0$: we reject $H_0$ when $T$ takes too small or too large values. At a risk of $\alpha=5$\%, the two bounds are
[#R>>]
```R
#[#R>>]
alpha <- 0.05
qnorm(alpha/2,0,1)
qnorm(1-alpha/2,0,1)
[#md>]We reject $H_0$ when $T<(-1.959964)$ or when$T>1.959964$.
#[#md>]
```
<div style="background-color:yellow;:Orange;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!--dyndoc [#msg} -->We reject $H_0$ when $T<(-1.959964)$ or when$T>1.959964$.
* For $H1: \mu\geq \mu_0$: we reject $H_0$ when $T$ takes too large values. At a risk of $\alpha=5$\%, the bound is
[#R>>]
```R
#[#R>>]
alpha <- 0.05
qnorm(alpha,0,1, lower.tail=FALSE)
[#md>]
#[#md>]
```
<!--dyndoc [#msg} -->
<div style="background-color:yellow;:Orange;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
We reject $H_0$ when $T>1.644854$.
* For $H1: \mu\leq \mu_0$: we reject $H_0$ when $T$ takes too small values. At a risk of $\alpha=5$\%, the bound is
[#R>>]
```R
#[#R>>]
alpha <- 0.05
qnorm(alpha,0,1)
[#md>]
We reject $H_0$ when $T<(-1.644854)$.
[#msg}
#[#md>]
```
<!--dyndoc [#msg] -->
We reject $H_0$ when $T<(-1.644854)$.
<!--dyndoc [#msg} -->
**Back to the example**. We assume that the sample of 13 patients is a Gaussian sample. The standard deviation $\sigma$ is supposed to be known and equal to $0.3$.
We want to test
......@@ -67,29 +90,38 @@ The test statistic is
$$ T = \sqrt{13} \left(\frac{\bar X - 0}{0.3}\right)$$
According to the null hypothesis $H_0$, $T$ follows the normal distribution $\mathcal{N}(0,1)$. The hypothesis $H_0$ is rejected when $T$ takes low values. At risk 5%, the bound is
[#R>>]
```R
#[#R>>]
qnorm(0.05,0,1)
[#md>]
#[#md>]
```
<!--dyndoc [#msg] -->
The decision rule is **Reject H_0** if $T \, <\, (-1.6449)$.
For $\bar X= -0.15$, the test statistic takes the value
[#R>>]
```R
#[#R>>]
n<-13
Xbar<--0.15
sig<-0.3
mu0<-0
t<-sqrt(n)*(Xbar-mu0)/sig
t
[#md>]
{#msg][#md>] **Decision and interpretation**: At risk 5\%, the hypothesis $H_0$ is rejected. The decision is that there has been a significant improvement.
[#class]green[#msg}
#[#md>]
```
<div style="background-color:green;:Orange;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!--dyndoc {#msg][#md>] --> **Decision and interpretation**: At risk 5\%, the hypothesis $H_0$ is rejected. The decision is that there has been a significant improvement.
<!--dyndoc [#class]green[#msg} -->
The previous case assumes that the standard deviation $\sigma$ is known. This is usually not the case in practice. The adaptation to the test of the mean, with an unknown variance is the following.
{#msg][#md>]
<div style="background-color:yellow;:Orange;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!--dyndoc {#msg][#md>] -->
The **test on the mean** of the sample compares the hypothesis $H_0: \mu=\mu_0$
versus a two-sided hypothesis $H1: \mu\neq \mu_0$ or a one-sided hypothesis $H1: \mu\geq \mu_0$ or $H1: \mu\leq \mu_0$.
......@@ -101,35 +133,60 @@ When $H_0$ is true, the statistic $T$ follows a Student distribution with $n-1$
The decision rules are before, but the bounds are computing from the Student distribution instead of the normal distribution.
{#msg][#md>]
<div style="background-color:yellow;:Orange;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!--dyndoc {#msg][#md>] -->
* For $H1: \mu\neq \mu_0$: we reject $H_0$ when $T$ takes too small or too large values. At a risk of $\alpha=5$\%, the two bounds are
[#R>>]
```R
#[#R>>]
alpha <- 0.05
n<-13
qt(alpha/2,n-1)
qt(1-alpha/2,n-1)
[#md>]
#[#md>]
```
-2.17881282966723
2.17881282966723
<div style="Orange;">
We reject $H_0$ when $T$ is outside the two bounds.
* For $H1: \mu\geq \mu_0$: we reject $H_0$ when $T$ takes too large values. At a risk of $\alpha=5$\%, the bound is
[#R>>]
```R
#[#R>>]
alpha <- 0.05
n<-13
qt(alpha,n-1)
[#md>]
#[#md>]
```
<div style="background-color:yellow;:Orange;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
We reject $H_0$ when $T$ is larger than the bound.
* For $H1: \mu\leq \mu_0$: we reject $H_0$ when $T$ takes too small values. At a risk of $\alpha=5$\%, the bound is
[#R>>]
```R
#[#R>>]
alpha <- 0.05
n<-13
qt(alpha,n-1)
[#md>]
#[#md>]
```
<div style="background-color:yellow;:Orange;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
We reject $H_0$ when $T$ is lower than the bound.
[#msg}
<!--dyndoc [#msg} -->
**Back to the example**. We assume that the standard deviation $\sigma$ is unknown and estimated to $0.3$.
We want to test
......@@ -139,35 +196,42 @@ The test statistic is
$$ T = \sqrt{13} \left(\frac{\bar X - 0}{0.3}\right)$$
According to the null hypothesis $H_0$, $T$ follows a Student distribution with $12$ degrees of freedom. The hypothesis $H_0$ is rejected when $T$ takes low values. At risk 5%, the bound is
[#R>>]
```R
#[#R>>]
qt(0.05,12)
[#md>]
#[#md>]
```
The decision rule is **Reject H_0** if $T \, <\, (-1.6449)$.
For $\bar X= -0.15$, the test statistic takes the value
[#R>>]
```R
#[#R>>]
n<-13
Xbar<--0.15
s<-0.3
mu0<-0
t<-sqrt(n)*(Xbar-mu0)/s
t
[#md>]
#[#md>]
```
{#msg][#md>] Decision and interpretation: At risk 5\%, the hypothesis $H_0$ is rejected. The decision is that there has been a significant improvement.
<!--dyndoc {#msg][#md>] --> Decision and interpretation: At risk 5\%, the hypothesis $H_0$ is rejected. The decision is that there has been a significant improvement.
[#class]green[#msg}
The previous example uses the estimation of the mean, the standard deviation and the sample size. In practice, all the values of the sample are usually available. In that case, the user could estimate himself the mean, the standard deviation and apply the previous instruction. Or he can directly use the function `t.test`.
{#msg][#md>]
<!--dyndoc {#msg][#md>] -->
**R code for the test on a mean**.
The mean of a sample can be tested using the function `t.test`.
`t.test(X,mu,alternative)`
[#msg}
<!--dyndoc [#msg} -->
The function computes the test statistic of Student’s T-test comparing `mean(X)` to `mu`, and the corresponding p-value according to the `alternative`.
The null hypothesis H$_0$ is “the mean is equal to mu”.
The alternative is in « `two.sided` » (default), « `less` », « `greater` »; they are understood as:
The null hypothesis H$_0$ is “the mean is equal to mu”.
The alternative is in « `two.sided` » (default), « `less` », « `greater` »; they are understood as:
* `two.sided`: the mean is not equal `mu`,
......@@ -176,11 +240,15 @@ The null hypothesis H$_0$ is “the mean is equal to mu”.
The alternative i
* `greater`: the mean is greater than `mu`.
**Example** To test if the mean of the age in `LenzI`sample is equal to 60, we run the following code
[#R>>]
```R
#[#R>>]
LenzI <- readRDS("data/LenzI.rds")
A<-LenzI$age
t.test(A, mu=60)
[#md>]
#[#md>]
```
The two hypotheses of this t-test are *H$_0: \mu=60$* and $H_1: \mu \neq 60$*.
The output reads as follows:
......@@ -197,25 +265,33 @@ The output reads as follows:
* the last value `61.1401` is the estimation of the mean of the sample
<div style="background-color:MediumSeaGreen;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!--dyndoc {#msg][#md>] --> **Interpretation** of the `t.test`output: the `p-value` is `0.1339`. Therefore, at a risk of 5\%, we do not reject H$_0$. The mean age is not significantly different from 60 years.
{#msg][#md>] **Interpretation** of the `t.test`output: the `p-value` is `0.1339`. Therefore, at a risk of 5\%, we do not reject H$_0$. The mean age is not significantly different from 60 years.
[#class]green[#msg}
<!-- [#class]green[#msg} -->
We can also apply a one-sided test by changing the `alternative`. To test the alternative $H_1: \mu \geq 60$, run
[#R>>]
```R
#[#R>>]
A<-LenzI$age
t.test(A, mu=60, alternative= "greater")
[#md>]
#[#md>]
```
{#msg][#md>] Interpretation of the `t.test`output: the `p-value` is `0.06695`. Therefore, at a risk of 5\%, we do not reject H$_0$. The mean age is not significantly greater than 60 years.
[#class]green[#msg}
<div style="background-color:MediumSeaGreen;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!--dyndoc {#msg][#md>] --> Interpretation of the `t.test`output: the `p-value` is `0.06695`. Therefo<div style="color:MediumSeaGreen;">re, at a risk of 5\%, we do not reject H$_0$. The mean age is not significantly greater than 60 years.
<!-- [#class]green[#msg}-->
**Remark** that even if the empirical mean (`61.1401`) is greater than 60 years, the difference is not significant, and we can not conclude, at a risk of 5\%, that the mean is larger than 60. Several reasons could be involved: the variability in the sample is too large (the standard error of the empirical mean is large) or the size of the sample is not large enough (the empirical mean is not estimated with enough precision).
## Test of the standard deviation or the variance
One can also test the value of the standard deviation or the variance of a Gaussian sample.
{#msg][#md>]
<div style="background-color:yellow;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!--dyndoc {#msg][#md>] -->
The test on the variance of the sample compares the hypothesis $H_0: \sigma^2=\sigma_0^2$
versus a two-sided hypothesis $H1: \sigma^2\neq \sigma^2_0$ or a one-sided hypothesis $H1: \sigma^2\geq \sigma^2_0$ or $H1: \sigma^2\leq \sigma^2_0$.
......@@ -223,15 +299,14 @@ The test statistic is
$$T = (n-1) \left(\frac{S^2}{\sigma_0^2}\right)$$
When $H_0$ is true, the statistic $T$ follows a chi-square distribution with $n-1$ degrees of freedom $\chi^2(n-1)$.
[#msg}
<!--dyndoc [#msg} -->
## Test of the mean for large sample
Finally, a test of the mean exists for large sample, and we don't need the assumption that the sample is Gaussian thanks to the Central Limit Theorem.
{#msg][#md>]
<div style="background-color:yellow;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!--dyndoc {#msg][#md>] -->
The test on the mean of a large sample compares the hypothesis $H_0: \mu=\mu_0$
versus a two-sided hypothesis $H1: \mu\neq \mu_0$ or a one-sided hypothesis $H1: \mu\geq \mu_0$ or $H1: \mu\leq \mu_0$.
......@@ -239,7 +314,7 @@ The test statistic is
$$T = \sqrt{n} \left(\frac{\bar X - \mu_0}{S}\right)$$
When $H_0$ is true, the statistic $T$ follows a normal distribution $\mathcal{N}(0,1)$.
[#msg}
<!--dyndoc [#msg} -->
With `R`, the test of the mean for large sample can be applied with the function `t.test` (as above), assuming that the normal distribution is very closed to a Student distribution with a large degree of freedom.
......@@ -249,21 +324,23 @@ The previous tests are applied to gaussian samples. When the variable of interes
**Example** For a certain disease, there exists a treatment that cures 70% of the cases. A laboratory proposes a new treatment claiming that it is better than the previous one. Out of 100 patients having received the new treatment, 74 of them have been cured. The expert would like to decide whether the new treatment should be authorized.

{#msg][#md>]
<div style="background-color:yellow;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!--dyndoc {#msg][#md>] -->
The test on the proportion of a binary sample compares the hypothesis $H_0: p=p_0$
versus a two-sided hypothesis $H1: p\neq p_0$ or a one-sided hypothesis $H1: p\leq p_0$ or $H1: p\geq p_0$.
[#msg}
<!--dyndoc [#msg} -->
**Back to the example** The hypotheses we want to test are $H_0: p=0.7$ versus $H1: p\geq 0.7$.

{#msg][#md>]
<div style="background-color:yellow;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!--dyndoc {#msg][#md>] -->
A value of a proportion can be tested using the function `prop.test`.
`prop.test(x,n,p,alternative)`
[#msg}
<!-- [#msg} -->
The null hypothesis $H_0$ is: “the proportion of `x` out of `n` is equal to `p`”.
The alternative is in « `two.sided` » (default), « `less` », « `greater` »; they are understood as:
The alternative is in « `two.sided` » (default), « `less` », « `greater` »; they are understood as:
* `two.sided`: the proportion `x/n` is not equal `p`,
......@@ -272,13 +349,19 @@ The alternative is in « `two.sided` » (default), « `less` », « `greate
* `greater`: the proportion `x/n` is greater than `p`.
**Back to the example** The one-sided test is applied running
[#R>>]
prop.test(x=74, n= 100, p=0.7, alternative=« greater »)
[#md>]
{#msg][#md>] **Interpretation** The p-value is `0.2225`. At risk 5\%, we do not reject $H_0$. The new treatment is not significantly better than the standard treatment. It should not be authorized.
[#class]green[#msg}
Note that when the whole binary sample $X$ is available (and not only the count of « successes »), the instruction is `prop.test(sum(X), length(X), p, alternative)`.

```R
#[#R>>]
prop.test(x=74, n= 100, p=0.7, alternative=« greater »)
#[#md>]
```
<div style="background-color:yellow;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!--dyndoc {#msg][#md>] --> **Interpretation** The p-value is `0.2225`. At risk 5\%, we do not reject $H_0$. The new treatment is not significantly better than the standard treatment. It should not be authorized.
<!-- [#class]green[#msg} -->
Note that when the whole binary sample $X$ is available (and not only the count of « successes »), the instruction is `prop.test(sum(X), length(X), p, alternative)`.

# Goodness-of-fit tests
A goodness-of-fit test answers the question: could the sample have been drawn at random from a particular distribution?
......@@ -289,24 +372,24 @@ A goodness-of-fit test answers the question: could the sample have been drawn at
For a discrete variable, the goodness-of-fit is measured by a distance between the relative frequencies of the variable, and the probabilities of the target distribution.
{#msg][#md>]
<div style="background-color:yellow;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!--dyndoc {#msg][#md>] -->
For a *discrete variable*, the **chi-squared test** compares the null hypothesis $H_0$: “the observed frequencies fit the theoretical probabilities”.
The alternative is “the observed frequencies do not fit the theoretical probabilities”.
Under $H_0$, the distance follows a chi-squared distribution. The parameter `df` of that chi-squared distribution is the number of different values minus 1, minus the number of estimated parameters, if there are any.
[#msg}
<!--dyndoc [#msg} -->
Under the alternative, the distance should be large, so that the p-value is computed as the right-tail probability of the chi-squared distribution at the distance.
{#msg][#md>]
<div style="background-color:yellow;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!--dyndoc {#msg][#md>] -->
A a goodness-of-fit for a discrete variable can be tested using the function `chisq.test `, if no parameter have been estimated. If $X$ is the sample, and $p$ is the distribution, the result is obtained by:
`chisq.test(table(X),p)`
[#msg}
<!--dyndoc [#msg} -->
In that command,
......@@ -321,9 +404,13 @@ If some frequencies are too small, a warning message may be issued. If one param
**Back to the example** The frequency table of the three genotypes AA, Aa, aa is (1600, 4900, 3500). The theoretical probabilities are (0.16, 0.48, 0.36). The chi-squared test is applied running
[#R>>]
```R
#[#R>>]
chisq.test(c(1600, 4900, 3500),p=c(0.16, 0.48, 0.36))
[#md>]
#[#md>]
```
The outputs are
* `data` the observed data
......@@ -334,27 +421,30 @@ The outputs are
* `p-value` the p-value
{#msg][#md>] The p-value is `0.08799`. At risk 5\%, we do not reject $H_0$. The theoretical probabilities are acceptable.
[#class]green[#msg}
<div style="color:MediumSeaGreen;">
<!--dyndoc {#msg][#md>] --> The p-value is `0.08799`. At risk 5\%, we do not reject $H_0$. The theoretical probabilities are acceptable.
<!-- [#class]green[#msg} -->
## Kolmogorov-Smirnov test
For a *continuous variable*, the goodness-of-fit is measured by a distance between the empirical cumulative distribution function (ecdf) of the variable, and the cdf of the target distribution.
{#msg][#md>]
<div style="background-color:yellow;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!-- {#msg][#md>] -->
For a *continuous variable*, the **Kolmogorov-Smirnov** test compares the null hypothesis H0: “the empirical distribution of the data fits the theoretical distribution” and $H_1$: "The empirical distribution does not fit the theoretical distribution".
[#msg}
<!--dyndoc [#msg} -->
{#msg][#md>]
<div style="background-color:yellow;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!--dyndoc {#msg][#md>] -->
If X is the sample, dist is the distribution, param the parameters of that distribution, the result is obtained by:
`ks.test(table(X), dist, param, alternative)’
[#msg}
<!--dyndoc [#msg} -->
The answer is “the fit is good”, if the p-value is large (above the risk). The variable X should not have ties (equal values). If some values are equal, a warning message is issued, indicating that the p-value is not quite as precise. This does not affect the validity of the result.
The null hypothesis $H_0$ is: “the distribution of the sample is the theoretical cdf”.
The alternative is in « `two.sided` » (default), « `less` », « `greater` »; they are understood as:
The alternative is in « `two.sided` » (default), « `less` », « `greater` »; they are understood as:
* `two.sided`: the ecdf of the sample is different from the theoretical cdf,
......@@ -366,19 +456,27 @@ sample are larger than those of the theoretical distribution),
**Example** The hypoxy level is given in data set `HY`. Let us plot the ecdf of `Level` and those of a normal distribution.
{#imgAvecCode]
```R
#{#imgAvecCode]
HY <- read.table("data/hypoxy.csv", header=TRUE, dec=",")
L<-HY$Level
plot(ecdf(L))
curve(pnorm(x,mean(L), sd(L)), col="red", add=TRUE)
[#}
#[#}
```
The red curve is the one of a normal distribution with parameters $\mu=1.2$ and $\sigma=1$. The ecdf (black curve) is quite far from the theoretical cdf. The `Level` variable is probably not normally distributed.
To test if the ecdf of `Level` is a normal distribution with parameters $\mu=1.2$ and $\sigma=1$ or not, run
[#R>>]
```R
#[#R>>]
ks.test(L, "pnorm", c(1.2,1))
[#md>]
#[#md>]
```
The outputs are
* `data` the observed data
......@@ -387,51 +485,68 @@ The outputs are
* `p-value` the p-value
{#msg][#md>] The p-value is `2.501e-06`. At risk 5\%, we reject $H_0$. The sample `level` does not follow a normal distribution $\mathcal{N}(1.2,1)$.
[#class]green[#msg}
<!--dyndoc {#msg][#md>] --> The p-value is `2.501e-06`. At risk 5\%, we reject $H_0$. The sample `level` does not follow a normal distribution $\mathcal{N}(1.2,1)$.
<!-- [#class]green[#msg} -->
The histogram of the variable `Level` reveals a right-skewed distribution, closed to a log-normal distribution. Let us log-transform the data
[#R>>]
```R
#[#R>>]
LL<-log(L)
[#md>]
#[#md>]
```
and plot the ecdf (black curve) of the log-Level and the cdf (red curve) of a normal distribution with parameters $\log(1.2)\approx 0.2$ and 1.
{#img]
```R
#{#img]
plot(ecdf(LL))
curve(pnorm(x,mean(LL), sd(LL)), col="red", add=TRUE)
[#}
# [#}
```
The two curves are quite closed. Let us test if the empirical cdf of `LL` is a normal distribution with parameters $\log(1.2)\approx 0.2$ and 1
[#R>>]
```R
#[#R>>]
ks.test(LL, "pnorm", c(0.2,1))
[#md>]
{#msg][#md>] The p-value is still very small. At risk 5\%, we reject $H_0$. The sample `log-level` does not follow a normal distribution $\mathcal{N}(0.2,1)$.
[#class]green[#msg}
#[#md>]
```
<div style="background-color:MediumSeaGreen;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!--dyndoc {#msg][#md>] --> The p-value is still very small. At risk 5\%, we reject $H_0$. The sample `log-level` does not follow a normal distribution $\mathcal{N}(0.2,1)$.
<!-- [#class]green[#msg} -->
Remark: the difference on the left of the two curves (ecdf and cdf) is large enough to reject the null hypothesis.
## Normality test
Testing whether a variable is normally distributed, is different from testing whether a particular normal distribution with given parameters fits the variable.
{#msg][#md>]
<div style="background-color:yellow;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!--dyndoc {#msg][#md>] -->
The normality of a *continuous variable* is tested with the **Shapiro-Wilk test**.
The null hypothesis H0 is: “the variable is normally distributed”. The alternative is “the variable is not normally distributed”.
[#msg}
<!--dyndoc [#msg} -->
{#msg][#md>]
<div style="background-color:MediumSeaGreen;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!--dyndoc {#msg][#md>] -->
If X is the sample, the result is obtained by:
`shapiro.test(X)’
[#msg}
<!--dyndoc [#msg} -->
**Back to the example** Test of the normality of the log-level of hypoxy:
[#R>>]
shapiro.test(LL)
[#md>]
```R
#[#R>>]
shapiro.test(LL)
#[#md>]
```
The outputs are
......@@ -441,9 +556,13 @@ shapiro.test(LL)
* `p-value` the p-value
{#msg][#md>] The p-value is `1.936e-06`. At risk 5\%, we reject $H_0$. The sample `log-level` is not normally distributed.
[#class]green[#msg}
<div style="background-color:MediumSeaGreen;border-style: solid ;border-color: black;border-width: 2px;padding:1%">
<!--dyndoc {#msg][#md>] --> The p-value is `1.936e-06`. At risk 5\%, we reject $H_0$. The sample `log-level` is not normally distributed.
<!-- [#class]green[#msg} -->
<!-- [#case} -->
```R
[#case}
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment