Commit 0e97d5cf authored by Nathan Rebiscoul's avatar Nathan Rebiscoul
Browse files

Finish bat exercise and correct img pb

parent 9325fbec
This diff is collapsed.
......@@ -2,6 +2,10 @@
#+AUTHOR: Nathan REBISCOUL
#+OPTIONS: tex:t
#+OPTIONS: tex:verbatim
#+PROPERTY: session *R*
#+PROPERTY: cache yes
#+PROPERTY: exports both
#+PROPERTY: tangle yes
* Presentation
Our goal is to implement the presentation and classical test presented
......@@ -39,7 +43,7 @@ The following objects are masked from ‘package:base’:
- *Brain mass and weight*
We can expect that the *brain mass* depend on the *size of the bat*.
#+begin_src R :results output :session *R* :exports both
#+begin_src R :results output graphics :file img/fig1.png :session *R* :exports both
phyto=myData[(myData$Diet==1),]
ggplot(phyto,aes(x=BOW,y=BRW)) + geom_point() +
ggtitle("Total brain weight as function of body mass") +
......@@ -47,8 +51,7 @@ ggplot(phyto,aes(x=BOW,y=BRW)) + geom_point() +
#+end_src
#+RESULTS:
:
: `geom_smooth()` using formula 'y ~ x'
[[file:img/img1.png]]
- *Simple regression model to explain total brain weight as a function*
......@@ -114,11 +117,12 @@ The additional information that we have here are explained variance
of the law use of the test. The sum of residual square is equal
to 4253838. It represent the noise that the model cannot explain.
#+begin_src R :results output :session *R* :exports both
#+begin_src R :results output graphics :file img/fig2.png :session *R* :exports both
plot(reg1$fitted.values, reg1$residuals, xlab="Predicted", ylab="Residuals")
#+end_src
#+RESULTS:
[[file:img/img2.png]]
We can see here two points that are far from other. The first one is
predicted around 3500 and 4000 and seems to have a big residual
......@@ -127,9 +131,13 @@ prediction dramaticaly higher than the other points (near of
10000). With a relatively high residuals but not so disturbing
compared to other points.
#+begin_src R :results output :session *R* :exports both
#+begin_src R :results output graphics :file img/fig3.png :session *R* :exports both
plot(reg1,4)
#+end_src
#+RESULTS:
[[file:img/fig3.png]]
This graph shows us that the seventh sample in the dataset is a significant
outlier. If we look at our dataset this outlier is the point with
a predicted brain weight of 10000.
......@@ -138,6 +146,10 @@ a predicted brain weight of 10000.
#+begin_src R :results output :session *R* :exports both
which(phyto$BRW>8000)
#+end_src
#+RESULTS:
: [1] 7
We can see here that the seventh sample is the only sample with a
predicted brain weight higher of 8000.
We will redo the analysis without this individual. For that we will
......@@ -196,7 +208,7 @@ We can see here that sum of squares residuals is far smaller than
before. So that the noise has greatly diminish.
#+begin_src R :results output :session *R* :exports both
#+begin_src R :results output graphics :file img/fig4.png :session *R* :exports both
par(mfcol=c(2,2))
plot(reg1,2)
plot(reg1,3)
......@@ -205,6 +217,7 @@ plot(reg2,3)
#+end_src
#+RESULTS:
[[file:img/fig4.png]]
The two graphs on the left are obtained with *reg1* and the two graph
......@@ -238,7 +251,7 @@ explanatory variables : *AUD*, *MOB*, and *HIP*.
- We are initially interested in the correlations of the variables two
by two.
#+begin_src R :results output :session *R* :exports both
#+begin_src R :results output graphics :file img/fig5.png :session *R* :exports both
library(corrplot)
phytoNum=phyto[,c(4:8)]
mat.cor=cor(phytoNum)
......@@ -246,6 +259,7 @@ corrplot(mat.cor, type="upper")
#+end_src
#+RESULTS:
[[file:img/fig5.png]]
- Pearson tests
......@@ -364,13 +378,14 @@ The multiple regression model corresponding to this analysis is $BRW =
We can also see that the p-value is low so it seems that there is corelation between
BRW and this sum.
#+begin_src R :results output :session *R* :exports both
#+begin_src R :results output output graphics :file img/fig6.png :session *R* :exports both
par(mfcol=c(2,1))
plot(regm,2)
plot(regm,3)
#+end_src
#+RESULTS:
[[file:img/fig6.png]]
We can see on the top graph that points below the -1 quantile don't
follow the line. Points above the 1 quantile don't follow as well the
......@@ -435,3 +450,67 @@ Coefficients:
(Intercept) HIP MOB AUD
-1003.95 44.35 -29.24 52.82
#+end_example
According to R documentation the step function "choose a model by AIC
in a stepwise algorithm". AIC is the Akaike Information Criterion and
it is use to estimate the relative quality of a statistical model. It
seems that the lower is your AIC score the best is your model. We can
see here that the algorithm try model with more and more variable. In
the first step he work with no variable. In the second step it try
with HIP... (i'm asking myself why he don't try also the other
variables alone). It seems that the model with the three variables is
the better one because it has the lower AIC score. Regardless of our
previous analysis it seems that MOB variable has maybe an interest in the
explanation of our model.
* Link between volume of the auditory part and diet
We will study link between volume of the auditory part and diet.
- Graphs comparison
#+begin_src R :results output graphics :file img/fig7.png :session *R* :exports both
myData$Diet_F = as.factor(myData$Diet)
with(myData,plot(AUD~Diet))
#+end_src
#+RESULTS:
[[file:img/fig7.png]]
#+begin_src R :results output graphics :file img/fig8.png :session *R* :exports both
with(myData, plot(AUD~Diet_F))
#+end_src
#+RESULTS:
[[file:img/fig8.png]]
We should look at the boxplot graph because it's easier to
read. Indeed we can easily see the median and quartiles. We can also
easily see in which interval data are and same for outliers. But maybe
for Diet 2 and 4 the first graph is more clear because there is a few
points.
- Regression analysis and anova
#+begin_src R :results output :session *R* :exports both
lm = lm(AUD~Diet_F, data=myData)
anova(lm)
#+end_src
#+RESULTS:
:
: Analysis of Variance Table
:
: Response: AUD
: Df Sum Sq Mean Sq F value Pr(>F)
: Diet_F 3 66.07 22.023 0.9293 0.4323
: Residuals 59 1398.26 23.699
There is a big difference between sum of square explained and the
residuals. It's means that no variation of AUD is explained by diet.
The p-value is very large, thus we can't reject H0 hypothesis. That's
mean that there is no linear influence between the two variables.
I don't know if it that surprising i'm not a biologist. And it's not
so clear that the diet influence only the volume. Because maybe there
is a difference of density and maybe have a "bigger brain" doesn't
mean have more ability.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment