Commit ab9214e8 authored by Laurence Viry's avatar Laurence Viry
Browse files

multidim notebook

parent 42c3522b
......@@ -17,68 +17,119 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Approach to make a multivariate analysis\n",
"# Introduction - Multivariate analysis\n",
" In many applications we observe **p** variables on **n** individuals (<FONT color=\"#B40404\">p and n being able to be high).</FONT><br\\>\n",
" <br\\>\n",
"Databases become more and more voluminous in term of individuals and variables measured on these individuals. The study of each variable and pairs of variables by classical descriptive statistics methods are indispensable but insufficient.<br\\>\n",
"<br\\>\n",
"The **multidimensional exploratory** methods allow:\n",
"* to take into account *the simultaneous variations* of a larger number of variables,\n",
"* to synthesize and / or simplify *the underlying structures*.\n",
"\n",
"How to perform a multiple factor analysis that handles several groups of continuous and/or categorical variables and/or contingency tables? And how can we improve the graphs obtained by the method?<br\\>\n",
"<br\\>\n",
"[Tutorial F. Husson](http://math.agrocampus-ouest.fr/infoglueDeliverLive/membres/Francois.Husson/Rcorner)\n",
"\n",
"1. Are there groups of variables? Use of Multiple factor analysis (function MFA i FactoMiner) <br\\>\n",
"1. Are there groups of variables? Use of Multiple factor analysis (function <FONT color=\"#B40404\">MFA</FONT> i FactoMiner) <br\\>\n",
"<br\\>\n",
"2. What is the type of information?\n",
" * Contingency table -> Factorial correspondence analysis (AFC or AFMTC if several)\n",
" * Several tables of contingencies -> AFMTC\n",
" * Table \"individuals - variables\" -> principal components analysis (PCA), Multiple factor analysis (MFA), nalyse es correspondances multiples. <br\\>\n",
" * Contingency table -> Factorial correspondence analysis (<FONT color=\"#B40404\">AFC</FONT> or <FONT color=\"#B40404\">AFMTC</FONT> if several)\n",
" * Several tables of contingencies -> <FONT color=\"#B40404\">AFMTC </FONT>\n",
" * Table \"individuals - variables\" -> principal components analysis (<FONT color=\"#B40404\">PCA</FONT>), Multiple factor analysis (<FONT color=\"#B40404\">MFA</FONT>). <br\\>\n",
"<br\\>\n",
"3. What are the active elements? what are the elements that will participate in the construction of the axes?<br\\>\n",
"<br\\>\n",
"4. What are the additional elements? they do not participate in the construction of the axes but are useful for interpretation.<br\\>\n",
"<br\\>\n",
"5. What is the nature of the active variables?\n",
" * Quantitative variables: Principal Component Analysis (PCA)\n",
" * Qualitative Variables: Multiple Correspondence Analysis (MCA)\n",
" * Mixed variables: AFDM<br\\>\n",
" <br\\>\n",
"Whatever the method, the additional variables can be of two types.\n",
" * **Quantitative variables**: Principal Component Analysis (<FONT color=\"#B40404\">PCA</FONT>)\n",
" * **Qualitative Variables**: Multiple Correspondence Analysis (<FONT color=\"#B40404\">MCA</FONT>)\n",
" * **Mixed variables**: <FONT color=\"#B40404\">AFDM</FONT>\n",
"<br\\>\n",
"6. Should we reduce the quantitative variables?<br\\>\n",
"(*Whatever the method, the additional variables can be of two types.*)<br\\>\n",
"<br\\>\n",
"7. Are there any missing data? How to treat them?<br\\>\n",
"\n",
"6. Should we **reduce** the quantitative variables?<br\\>\n",
"<br\\>\n",
"7. Are there any **missing data**? How to treat them?<br\\>\n",
"<br\\>\n",
"8. The steps of the analysis<br\\>\n",
"8. The *steps of the analysis*<br\\>\n",
" * Start the factor analysis.<br\\>\n",
"<br\\>\n",
" * Describe the factorial axes by the active initial variables (dimdesc)<br\\>\n",
" * Describe the factorial axes by the active initial variables (<FONT color=\"#B40404\">dimdesc</FONT>)<br\\>\n",
"<br\\>\n",
" * It may be interesting to use a classification method to determine groups of individuals (HCPC)<br\\>\n",
" * It may be interesting to use a classification method to determine groups of individuals (<FONT color=\"#B40404\">HCPC</FONT>)<br\\>\n",
"<br\\>\n",
"<img src=\"../../figures/MultiFactorielAnalysis.jpg\",width=\"80%\",height=\"80%\">"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pour en savoir plus [voir vidéo F. Husson ](https://www.youtube.com/watch?v=UrS00sOpeec) (in french).\n",
"To know more [voir vidéo F. Husson ](https://www.youtube.com/watch?v=UrS00sOpeec) (in french).\n",
"\n",
"# Principal component analysis \n",
"In this course, we present only how to analyze tables with quantitative variables using **principal components analysis** (PCA) and how to use the method with **FactoMineR** in **R**. \n",
"\n",
"# Introduction\n",
"## Introduction\n",
"The aim of the PCA method is to summarize a table of individuals x variables data, the variables being quantitatives.\n",
"\n",
"The PCA allows to study the similarities between individuals from the point of view of a group of variables and gives off profiles of individuals.\n",
"\n",
"It allows a balance of the linear links between variables from the correlation coefficients.\n",
"\n",
"These studies can be related to characterize individuals or groups of individuals by variables and to illustrate the links between variables from characteristic individuals.\n",
"These studies can be related to characterize individuals or groups of individuals by variables and to illustrate the links between variables from characteristic individuals."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data - practicalities\n",
"### Which kinds of data\n",
"PCA applies to data tables where rows are considered as individuals and columns as quantitative variables\n",
"\n",
"On dispose de **p variables** <FONT color=\"#013ADF\">$X^{1},X^{2},\\ldots,X^{p}$</FONT> observées\n",
"sur **n individus**.\n",
"\n",
"## Data - practicalities"
"Principal component analysis, also known as <FONT color=\"#B40404\">PCA</FONT>, applies to data tables where **rows** can be considered like individuals and **columns** like quantitative variables. <br\\>\n",
"\n",
"### Data table\n",
" We note <FONT color=\"#013ADF\">$x^{j}_{i}$</FONT> the observation of the variable <FONT color=\"#013ADF\">$X^{j}$</FONT> on the <FONT color=\"#013ADF\">ith</FONT> individual.\n",
" \n",
" <table style=\"width:60%\">\n",
" <tr>\n",
" <th>\n",
" $$\\begin{aligned}\n",
"X= \\left[\\begin{array}{ccc}\n",
"x_{1}^{1} & \\dots & x_{1}^{p}\\\\\n",
"\\vdots & \\ddots & \\vdots\\\\\n",
"x_{n}^{1} & \\dots & x_{n}^{p}\n",
"\\end{array}\\right] & \\quad n \\quad \\mbox{individus} \\nonumber \\\\\n",
" p \\quad \\mbox{variables} \\nonumber \\end{aligned}$$\n",
" </th>\n",
" <th>\n",
"$$\\bar{x^{j}}=\\sum_{i=1}^{n} x^{j}_{i}$$\n",
"$$\\sigma^{j}=\\sqrt{\\sum_{i=1}^{n} (x^{j}_{i}-\\bar{x^{j}})^2}$$\n",
"</th>\n",
" </tr>\n",
"</table> \n",
"\n",
"\n",
"The data table can be analyzed through its **lines** (individuals) or through its **columns** (variables).\n",
"\n",
"Le tableau des données peut être analysé à travers ses **lignes** (individus) ou à travers ses\n",
"**colonnes**(variables).<br\\>\n",
"\n",
"<FONT color=\"#013ADF\">$X^{j}$</FONT>$\\,= \\, (X^{j}_{1},\\ldots,X^{j}_{n}) \\quad \\mbox{variable} \\quad j, \\quad \\mbox{dans} \\quad \\mathcal{R}^{n}$\n",
"<FONT color=\"#013ADF\">$X_{i}$</FONT>$\\, = \\, (X^{1}_{i},\\ldots,X^{p}_{i}) \\quad\\mbox{individu} \\quad i, \\quad \\mbox{dans} \\quad \\mathcal{R}^{p}$\n",
"<br\\>\n",
"<br\\>\n",
"<figure>\n",
" <img src=\"../../figures/TwoClouds.jpg\",width=\"40%\",height=\"20%\">\n",
" <figcaption> <br><em>(Exploratory Multivariate Data Analysis</em> <a href=\"https://www.fun-mooc.fr/courses/course-v1:agrocampusouest+40001S04EN+session04/info\">MOOC AgroCampus Ouest )</a>\n",
" </figcaption>\n",
" </figure>"
]
},
{
......@@ -87,39 +138,268 @@
"source": [
"# Studying individuals and variables\n",
"## Studying individuals\n",
"* When can we say that two individuals are similar with respect to all the variables or a group of variables?\n",
"* If there are many individuals, is it possible to categorize them? <br\\>\n",
"<br\\>\n",
"⇒ groups of individuals, partitions between them\n",
"\n",
"## Studying variables\n",
"* The correlation matrix provides a simple indication on the linear link\n",
"between variables two by two.\n",
"* Look for similarities between all the variables or a group of variables.\n",
"* Synthetic indicators are sought to summarize groups of\n",
"variables.\n",
"\n",
"## Studying variables"
"⇒ visualization of the correlation matrix <br\\>\n",
"⇒ find a small number of synthetic variables to summarize many\n",
"variables\n",
"\n",
"## Links between the two points-of-view\n",
"* Characterize groups of individuals using variables.\n",
"* Use typical individuals to interpret groups of variables."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PCA with FactoMineR \n",
"FactoMineR is an R package dedicated to multivariate Exploratory Data Analysis. It is developed and maintained by François Husson, Julie Josse, Sébastien Lê, d'Agrocampus Rennes, and J. Mazet."
"## Some examples\n",
"\n",
"* **Sensory analysis**: note of the descriptor k for the product i\n",
"* **Ecology**: concentration of the pollutant k on the river i\n",
"* **Economy**: value of indicator k for year i\n",
"* **Genetics**: Gene expression k for the patient i\n",
"* **Biology**: k measure for the animal i\n",
"* **Marketing**: satisfaction index value k for brand i\n",
"* **Sociology**: time spent in activity k by the individuals of the CSP i\n",
"* $\\ldots$\n",
"\n",
"## Example: Climate of different European countries\n",
"<br\\>\n",
"To illustrate this course, we will take **temperature data** to analyse climate of different European countries.\n",
"\n",
"### Description of the data:\n",
"* average monthly temperatures (over 30 years).\n",
"* the annual average temperature, the thermal amplitude.\n",
"* longitude, latitude of each city\n",
"* A qualitative variable belonging to a region of Europe: Northern Europe, south, east and west.\n",
"### Data extract\n",
" <table style=\"width:100%\">\n",
" <tr>\n",
" <th> Town</th>\n",
" <th>Janv</th>\n",
" <th>Fév </th>\n",
" <th>... </th>\n",
" <th>Nov </th>\n",
" <th>Déc </th>\n",
" <th>Moy </th>\n",
" <th>Amp </th>\n",
" <th>Lat </th>\n",
" <th>Lon </th>\n",
" <th>Rég</th>\n",
" </tr>\n",
" <tr>\n",
" <td>Amsterdam </td> \n",
" <td>2.9 </td> \n",
" <td>2.5 </td> \n",
" <td>... </td> \n",
" <td>7.0 </td> \n",
" <td>4.4 </td> \n",
" <td>9.9 </td> \n",
" <td>14.6 </td> \n",
" <td>52.2 </td> \n",
" <td>4.5 </td> \n",
" <td>Ouest\n",
" </td>\n",
" </tr>\n",
" <tr>\n",
" <td>Athènes </td>\n",
" <td>9.1 </td>\n",
" <td>9.7 </td>\n",
" <td>... </td>\n",
" <td>14.6</td>\n",
" <td>11.0</td>\n",
" <td>17.8 </td>\n",
" <td>18.3 </td>\n",
" <td>37.6</td>\n",
" <td>23.5</td>\n",
" <td>Sud </td>\n",
" </tr>\n",
" <tr>\n",
" <td> Berlin </td>\n",
" <td>-0.2</td>\n",
" <td>0.1</td>\n",
" <td>...</td>\n",
" <td>4.2</td>\n",
" <td>1.2</td>\n",
" <td>9.1</td>\n",
" <td>18.5 </td>\n",
" <td>52.3 </td>\n",
" <td>13.2</td>\n",
" <td>Ouest</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Helsinki</td>\n",
" <td>-5.8</td>\n",
" <td>-5.0</td>\n",
" <td>...</td>\n",
" <td>0.1</td>\n",
" <td>-2.3</td>\n",
" <td>4.8</td>\n",
" <td>23.4</td>\n",
" <td>60.1</td>\n",
" <td>25.0</td>\n",
" <td>Nord</td>\n",
" </tr>\n",
" <tr>\n",
" <td>Kiev</td>\n",
" <td>-5.9</td>\n",
" <td>-5.0</td>\n",
" <td>...</td>\n",
" <td>1.2</td>\n",
" <td>-3.6</td>\n",
" <td>7.1</td>\n",
" <td>25.3</td>\n",
" <td>50.3</td>\n",
" <td>30.3</td>\n",
" <td>Est</td>\n",
" </tr>\n",
" <tr>\n",
" <td> Copenhague</td>\n",
" <td>-0.4 </td>\n",
" <td>-0.4</td>\n",
" <td>...</td>\n",
" <td>4.1</td>\n",
" <td>1.3</td>\n",
" <td>7.8</td>\n",
" <td>17.5</td>\n",
" <td>55.4</td>\n",
" <td>12.3</td>\n",
" <td>Nord</td>\n",
" </tr>\n",
" <tr>\n",
" <td> Budapest </td>\n",
" <td>-1.1</td>\n",
" <td>0.8</td>\n",
" <td>...</td>\n",
" <td>5.1 </td>\n",
" <td>0.7</td>\n",
" <td>10.9</td>\n",
" <td>23.1</td>\n",
" <td>47.3</td>\n",
" <td>19.0</td>\n",
" <td>Est</td>\n",
" </tr>\n",
" <tr>\n",
" <td> Bruxelles</td>\n",
" <td>3.3</td>\n",
" <td>3.3</td>\n",
" <td>...</td>\n",
" <td>6.7</td>\n",
" <td>4.4</td>\n",
" <td>10.3</td>\n",
" <td>14.4</td>\n",
" <td>50.5</td>\n",
" <td>4.2</td>\n",
" <td>Ouest \n",
" </tr>\n",
" <tr>\n",
" <td> ...</td>\n",
" <td></td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td> \n",
" <td>...</td> \n",
"</table> \n",
"### Read data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"load(\"../../data/temperat.RData\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Factoshiny: interactive graphs in exploratory multivariate data analysis "
"### Descriptive statistics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Descriptive statistics\n",
"summary(temperat)\n",
"# Scaterplot\n",
"pairs(temperat[,1:12])\n",
"# Correlation\n",
"cor(temperat[,1:12]) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Interpretation aids"
"### Descriptives statistics"
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"summary(temp)\n",
"pairs(temps)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PCA with FactoMineR \n",
"FactoMineR is an R package dedicated to multivariate Exploratory Data Analysis. It is developed and maintained by François Husson, Julie Josse, Sébastien Lê, d'Agrocampus Rennes, and J. Mazet."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Factoshiny: interactive graphs in exploratory multivariate data analysis \n",
"Le package [Factoshiny](http://factominer.free.fr/graphs/factoshiny.html) permet d'utiliser le package [FactoMineR](http://factominer.free.fr) à l'aide d'une **interface graphique**, et permet aussi de modifier les graphiques de façon **interactive**. Ce package est très utile pour optimiser ces graphiques avant de les diffuser."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Interpretation aids\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# A detailed PCA example"
]
......
%% Cell type:markdown id: tags:
This course is inspired by the MOOC (Massive Open Online Course) "[Exploratory Multivariate Data Analysis](https://www.fun-mooc.fr/courses/course-v1%3Aagrocampusouest%2B40001S04EN%2Bsession04/about)" (the first session in English was in 2017) from the platform FUN. <br\>
(Multivariate Multidimensional Data Analysis (Département de mathématiques
appliquées d’Agrocampus Ouest - Rennes, F. Husson, J. Pagès, M. Houée-Bigot)
The 2nd edition of the MOOC will start the 5h of March 2018, you can subscribe until april 20.
Version en français: [Analyse de données multidimensionnelles](https://www.fun-mooc.fr/courses/course-v1:agrocampusouest+40001S04+session04/about)
%% Cell type:markdown id: tags:
# Approach to make a multivariate analysis
# Introduction - Multivariate analysis
In many applications we observe **p** variables on **n** individuals (<FONT color="#B40404">p and n being able to be high).</FONT><br\>
<br\>
Databases become more and more voluminous in term of individuals and variables measured on these individuals. The study of each variable and pairs of variables by classical descriptive statistics methods are indispensable but insufficient.<br\>
<br\>
The **multidimensional exploratory** methods allow:
* to take into account *the simultaneous variations* of a larger number of variables,
* to synthesize and / or simplify *the underlying structures*.
How to perform a multiple factor analysis that handles several groups of continuous and/or categorical variables and/or contingency tables? And how can we improve the graphs obtained by the method?<br\>
<br\>
[Tutorial F. Husson](http://math.agrocampus-ouest.fr/infoglueDeliverLive/membres/Francois.Husson/Rcorner)
1. Are there groups of variables? Use of Multiple factor analysis (function MFA i FactoMiner) <br\>
1. Are there groups of variables? Use of Multiple factor analysis (function <FONT color="#B40404">MFA</FONT> i FactoMiner) <br\>
<br\>
2. What is the type of information?
* Contingency table -> Factorial correspondence analysis (AFC or AFMTC if several)
* Several tables of contingencies -> AFMTC
* Table "individuals - variables" -> principal components analysis (PCA), Multiple factor analysis (MFA), nalyse es correspondances multiples. <br\>
* Contingency table -> Factorial correspondence analysis (<FONT color="#B40404">AFC</FONT> or <FONT color="#B40404">AFMTC</FONT> if several)
* Several tables of contingencies -> <FONT color="#B40404">AFMTC </FONT>
* Table "individuals - variables" -> principal components analysis (<FONT color="#B40404">PCA</FONT>), Multiple factor analysis (<FONT color="#B40404">MFA</FONT>). <br\>
<br\>
3. What are the active elements? what are the elements that will participate in the construction of the axes?<br\>
<br\>
4. What are the additional elements? they do not participate in the construction of the axes but are useful for interpretation.<br\>
<br\>
5. What is the nature of the active variables?
* Quantitative variables: Principal Component Analysis (PCA)
* Qualitative Variables: Multiple Correspondence Analysis (MCA)
* Mixed variables: AFDM<br\>
<br\>
Whatever the method, the additional variables can be of two types.
* **Quantitative variables**: Principal Component Analysis (<FONT color="#B40404">PCA</FONT>)
* **Qualitative Variables**: Multiple Correspondence Analysis (<FONT color="#B40404">MCA</FONT>)
* **Mixed variables**: <FONT color="#B40404">AFDM</FONT>
<br\>
6. Should we reduce the quantitative variables?<br\>
(*Whatever the method, the additional variables can be of two types.*)<br\>
<br\>
7. Are there any missing data? How to treat them?<br\>
6. Should we **reduce** the quantitative variables?<br\>
<br\>
8. The steps of the analysis<br\>
7. Are there any **missing data**? How to treat them?<br\>
<br\>
8. The *steps of the analysis*<br\>
* Start the factor analysis.<br\>
<br\>
* Describe the factorial axes by the active initial variables (dimdesc)<br\>
* Describe the factorial axes by the active initial variables (<FONT color="#B40404">dimdesc</FONT>)<br\>
<br\>
* It may be interesting to use a classification method to determine groups of individuals (HCPC)<br\>
* It may be interesting to use a classification method to determine groups of individuals (<FONT color="#B40404">HCPC</FONT>)<br\>
<br\>
<img src="../../figures/MultiFactorielAnalysis.jpg",width="80%",height="80%">
%% Cell type:code id: tags:
``` R
```
%% Cell type:markdown id: tags:
Pour en savoir plus [voir vidéo F. Husson ](https://www.youtube.com/watch?v=UrS00sOpeec) (in french).
To know more [voir vidéo F. Husson ](https://www.youtube.com/watch?v=UrS00sOpeec) (in french).
# Principal component analysis
In this course, we present only how to analyze tables with quantitative variables using **principal components analysis** (PCA) and how to use the method with **FactoMineR** in **R**.
# Introduction
## Introduction
The aim of the PCA method is to summarize a table of individuals x variables data, the variables being quantitatives.
The PCA allows to study the similarities between individuals from the point of view of a group of variables and gives off profiles of individuals.
It allows a balance of the linear links between variables from the correlation coefficients.
These studies can be related to characterize individuals or groups of individuals by variables and to illustrate the links between variables from characteristic individuals.
%% Cell type:markdown id: tags:
## Data - practicalities
### Which kinds of data
PCA applies to data tables where rows are considered as individuals and columns as quantitative variables
On dispose de **p variables** <FONT color="#013ADF">$X^{1},X^{2},\ldots,X^{p}$</FONT> observées
sur **n individus**.
Principal component analysis, also known as <FONT color="#B40404">PCA</FONT>, applies to data tables where **rows** can be considered like individuals and **columns** like quantitative variables. <br\>
### Data table
We note <FONT color="#013ADF">$x^{j}_{i}$</FONT> the observation of the variable <FONT color="#013ADF">$X^{j}$</FONT> on the <FONT color="#013ADF">ith</FONT> individual.
<table style="width:60%">
<tr>
<th>
$$\begin{aligned}
X= \left[\begin{array}{ccc}
x_{1}^{1} & \dots & x_{1}^{p}\\
\vdots & \ddots & \vdots\\
x_{n}^{1} & \dots & x_{n}^{p}
\end{array}\right] & \quad n \quad \mbox{individus} \nonumber \\
p \quad \mbox{variables} \nonumber \end{aligned}$$
</th>
<th>
$$\bar{x^{j}}=\sum_{i=1}^{n} x^{j}_{i}$$
$$\sigma^{j}=\sqrt{\sum_{i=1}^{n} (x^{j}_{i}-\bar{x^{j}})^2}$$
</th>
</tr>
</table>
The data table can be analyzed through its **lines** (individuals) or through its **columns** (variables).
Le tableau des données peut être analysé à travers ses **lignes** (individus) ou à travers ses
**colonnes**(variables).<br\>
<FONT color="#013ADF">$X^{j}$</FONT>$\,= \, (X^{j}_{1},\ldots,X^{j}_{n}) \quad \mbox{variable} \quad j, \quad \mbox{dans} \quad \mathcal{R}^{n}$
<FONT color="#013ADF">$X_{i}$</FONT>$\, = \, (X^{1}_{i},\ldots,X^{p}_{i}) \quad\mbox{individu} \quad i, \quad \mbox{dans} \quad \mathcal{R}^{p}$
<br\>
<br\>
<figure>
<img src="../../figures/TwoClouds.jpg",width="40%",height="20%">
<figcaption> <br><em>(Exploratory Multivariate Data Analysis</em> <a href="https://www.fun-mooc.fr/courses/course-v1:agrocampusouest+40001S04EN+session04/info">MOOC AgroCampus Ouest )</a>
</figcaption>
</figure>
%% Cell type:markdown id: tags:
# Studying individuals and variables
## Studying individuals
* When can we say that two individuals are similar with respect to all the variables or a group of variables?
* If there are many individuals, is it possible to categorize them? <br\>
<br\>
⇒ groups of individuals, partitions between them
## Studying variables
* The correlation matrix provides a simple indication on the linear link
between variables two by two.
* Look for similarities between all the variables or a group of variables.
* Synthetic indicators are sought to summarize groups of
variables.
⇒ visualization of the correlation matrix <br\>
⇒ find a small number of synthetic variables to summarize many
variables
## Links between the two points-of-view
* Characterize groups of individuals using variables.
* Use typical individuals to interpret groups of variables.
%% Cell type:markdown id: tags:
## Some examples
* **Sensory analysis**: note of the descriptor k for the product i
* **Ecology**: concentration of the pollutant k on the river i
* **Economy**: value of indicator k for year i
* **Genetics**: Gene expression k for the patient i
* **Biology**: k measure for the animal i
* **Marketing**: satisfaction index value k for brand i
* **Sociology**: time spent in activity k by the individuals of the CSP i
* $\ldots$
## Example: Climate of different European countries
<br\>
To illustrate this course, we will take **temperature data** to analyse climate of different European countries.
### Description of the data:
* average monthly temperatures (over 30 years).
* the annual average temperature, the thermal amplitude.
* longitude, latitude of each city
* A qualitative variable belonging to a region of Europe: Northern Europe, south, east and west.
### Data extract
<table style="width:100%">
<tr>
<th> Town</th>
<th>Janv</th>
<th>Fév </th>
<th>... </th>
<th>Nov </th>
<th>Déc </th>
<th>Moy </th>
<th>Amp </th>
<th>Lat </th>
<th>Lon </th>
<th>Rég</th>
</tr>
<tr>
<td>Amsterdam </td>
<td>2.9 </td>
<td>2.5 </td>
<td>... </td>
<td>7.0 </td>
<td>4.4 </td>
<td>9.9 </td>
<td>14.6 </td>
<td>52.2 </td>
<td>4.5 </td>
<td>Ouest
</td>
</tr>
<tr>