Commit 0b4b3015 by Laurence Viry

### modification Multidim

parent d47e043c
 %% Cell type:markdown id: tags: This course is inspired by the MOOC (Massive Open Online Course) "[Exploratory Multivariate Data Analysis](https://www.fun-mooc.fr/courses/course-v1%3Aagrocampusouest%2B40001S04EN%2Bsession04/about)" (the first session in English was in 2017) from the platform FUN. (Multivariate Multidimensional Data Analysis (Département de mathématiques appliquées d’Agrocampus Ouest - Rennes, F. Husson, J. Pagès, M. Houée-Bigot) The 2nd edition of the MOOC will start the 5h of March 2018, you can subscribe until april 20. Version en français: [Analyse de données multidimensionnelles](https://www.fun-mooc.fr/courses/course-v1:agrocampusouest+40001S04+session04/about) %% Cell type:markdown id: tags: # Introduction - Multivariate analysis In many applications we observe **p** variables on **n** individuals (p and n being able to be high). Databases become more and more voluminous in term of individuals and variables measured on these individuals. The study of each variable and pairs of variables by classical descriptive statistics methods are indispensable but insufficient. The **multidimensional exploratory** methods allow: * to take into account *the simultaneous variations* of a larger number of variables, * to synthesize and / or simplify *the underlying structures*. How to perform a multiple factor analysis that handles several groups of continuous and/or categorical variables and/or contingency tables? And how can we improve the graphs obtained by the method? [Tutorial F. Husson](http://math.agrocampus-ouest.fr/infoglueDeliverLive/membres/Francois.Husson/Rcorner) 1. Are there groups of variables? Use of Multiple factor analysis (function MFA i FactoMiner) 2. What is the type of information? * Contingency table -> Factorial correspondence analysis (AFC or AFMTC if several) * Several tables of contingencies -> AFMTC * Table "individuals - variables" -> principal components analysis (PCA), Multiple factor analysis (MFA). 3. What are the active elements? what are the elements that will participate in the construction of the axes? 4. What are the additional elements? they do not participate in the construction of the axes but are useful for interpretation. 5. What is the nature of the active variables? * **Quantitative variables**: Principal Component Analysis (PCA) * **Qualitative Variables**: Multiple Correspondence Analysis (MCA) * **Mixed variables**: AFDM (*Whatever the method, the additional variables can be of two types.*) 6. Should we **reduce** the quantitative variables? 7. Are there any **missing data**? How to treat them? 8. The *steps of the analysis* * Start the factor analysis. * Describe the factorial axes by the active initial variables (dimdesc) * It may be interesting to use a classification method to determine groups of individuals (HCPC) %% Cell type:markdown id: tags: To know more [see vidéo F. Husson ](https://www.youtube.com/watch?v=UrS00sOpeec) (in french). In this course, we present only how to analyze tables with quantitative variables using **principal components analysis** (PCA) and how to use the method with **FactoMineR** in **R**. # Principal component analysis ## Introduction The aim of the PCA method is to summarize a table of individuals x variables data, the variables being quantitatives. The PCA allows to study the similarities between individuals from the point of view of a group of variables and gives off profiles of individuals. It allows a balance of the linear links between variables from the correlation coefficients. These studies can be related to characterize individuals or groups of individuals by variables and to illustrate the links between variables from characteristic individuals. %% Cell type:markdown id: tags: ## Data - practicalities ### Which kinds of data We have **p variables** $X^{1},X^{2},\ldots,X^{p}$ observed on **n individuals**. Principal component analysis, also known as PCA, applies to data tables where **rows** can be considered like individuals and **columns** like **quantitative** variables. ### Data table We note $x^{j}_{i}$ the observation of the variable $X^{j}$ on the ith individual.
\begin{aligned} X= \left[\begin{array}{ccc} x_{1}^{1} & \dots & x_{1}^{p}\\ \vdots & \ddots & \vdots\\ x_{n}^{1} & \dots & x_{n}^{p} \end{array}\right] & \quad n \quad \mbox{individus} \nonumber \\ p \quad \mbox{variables} \nonumber \end{aligned} $$\bar{x^{j}}=\sum_{i=1}^{n} x^{j}_{i}$$ $$\sigma^{j}=\sqrt{\sum_{i=1}^{n} (x^{j}_{i}-\bar{x^{j}})^2}$$
The data table can be analyzed through its **lines** (individuals) or through its **columns** (variables). Le tableau des données peut être analysé à travers ses **lignes** (individus) ou à travers ses **colonnes**(variables). $X^{j}$$\,= \, (X^{j}_{1},\ldots,X^{j}_{n}) \quad \mbox{variable} \quad j, \quad \mbox{dans} \quad \mathcal{R}^{n} X_{i}$$\, = \, (X^{1}_{i},\ldots,X^{p}_{i}) \quad\mbox{individu} \quad i, \quad \mbox{dans} \quad \mathcal{R}^{p}$ %% Cell type:markdown id: tags: ### Problems and objectives #### Studying individuals * When can we say that two individuals are similar with respect to all the variables or a group of variables? * If there are many individuals, is it possible to categorize them?
groups of individuals partitions between them
%% Cell type:markdown id: tags: #### Studying variables * The correlation matrix provides a simple indication on the linear link between variables two by two. * Look for similarities between all the variables or a group of variables. * Synthetic indicators are sought to summarize groups of variables.
visualization of the correlation matrix find a small number of synthetic variables to summarize many variables
#### Links between the two points-of-view * Characterize groups of individuals using variables. * Use typical individuals to interpret groups of variables. %% Cell type:markdown id: tags: ### Some examples Data tables, with individuals in rows and variables in columns, can be found in many different areas, which means that we can perform PCA on quite a diverse range of data sets. * **Sensory analysis**: note of the descriptor k for the product i * **Ecology**: concentration of the pollutant k on the river i * **Economy**: value of indicator k for year i * **Genetics**: Gene expression k for the patient i * **Biology**: k measure for the animal i * **Marketing**: satisfaction index value k for brand i * **Sociology**: time spent in activity k by the individuals of the CSP i * $\ldots$ %% Cell type:markdown id: tags: ### Example: Climate of different European countries To illustrate this course, we will take **temperature data** to analyse climate of different European countries. #### Description of the data: * 35 individuals (lines): European cities * 17 variables (columns) : - 12 average monthly temperatures (over 30 years) - 2 geographical variables (latitude, longitude of each city) - the annual average temperature, the thermal amplitude. - A qualitative variable belonging to a region of Europe: Northern Europe, south, east and west. #### Data extract
Town Janv Fév ... Nov Déc Moy Amp Lat Lon Rég
Amsterdam 2.9 2.5 ... 7.0 4.4 9.9 14.6 52.2 4.5 Ouest
Athènes 9.1 9.7 ... 14.6 11.0 17.8 18.3 37.6 23.5 Sud
Berlin -0.2 0.1 ... 4.2 1.2 9.1 18.5 52.3 13.2 Ouest
Helsinki -5.8 -5.0 ... 0.1 -2.3 4.8 23.4 60.1 25.0 Nord
Kiev -5.9 -5.0 ... 1.2 -3.6 7.1 25.3 50.3 30.3 Est
Copenhague -0.4 -0.4 ... 4.1 1.3 7.8 17.5 55.4 12.3 Nord
Budapest -1.1 0.8 ... 5.1 0.7 10.9 23.1 47.3 19.0 Est
Bruxelles 3.3 3.3 ... 6.7 4.4 10.3 14.4 50.5 4.2 Ouest
... ... ... ... ... ... ... ... ... ...
#### Read data %% Cell type:code id: tags:  R load("./data/temperat.RData")  %% Cell type:markdown id: tags: #### Descriptive statistics %% Cell type:code id: tags:  R # Descriptive statistics dim(temperat) summary(temperat)  %%%% Output: display_data 1. 35 2. 17 \begin{enumerate*} \item 35 \item 17 \end{enumerate*} %%%% Output: display_data %% Cell type:code id: tags:  R # Scaterplot pairs(temperat[,1:12])  %%%% Output: display_data [Hidden Image Output] %% Cell type:code id: tags:  R # Correlation cor(temperat[,1:12])  %% Cell type:markdown id: tags: ## Method ### The cloud of individuals 1 **individual** = 1 **row** of the table = 1 **point** in a space of dimension **p** (number of variables) * p=1 => points on a straight * p=2 => points in a plane * p=3 => points in a 3D space, more difficult to represent * p=4 and more => impossible to represent
**Concept of resemblance between two individuals** $$\|X_{i} - X_{i^{'}}\|^{2}=\sum_{j=1}^{p}(x^{j}_{i} - x^{j}_{i^{'}})^{2}$$ *This difference can be represented by choosing another metric*.
%% Cell type:markdown id: tags: #### Centering – standardizing data * *Center the variables* : Centering the cloud does not distort the cloud $$\tilde{x^{j}_{i}} = x^{j}_{i} -\bar{x^{j}} \quad (\bar{x^{j}} \mbox{est la moyenne de}\, X^{j})$$ => We will always be at the *center of gravity* of the cloud * *Center and reduce the variables* (Normalized ACP):
$\sigma^{j}$ is the standard deviation of $X^{j}$ - When the variables are not expressed with the same units. - Do not reduce the variables gives more importance to variables that have great variability. $$\frac{x^{j}_{i} -\bar{x^{j}}}{\sigma^{j}}\quad j=1\dots p$$
%% Cell type:code id: tags:  R # Center and reduce the data temperat temp_standard<-as.data.frame(lapply(temperat[1:16], scale, center=T, scale=T)) summary(temp_standard)  %% Cell type:markdown id: tags: #### Less deformation of the cloud The reduction of the cloud of individuals is done by orthogonal projection on a affine subspace $\mathcal{H}$. The choice of the subspace $\mathcal{H}$ is obtained by minimizing the deformation of the cloud by projection. * *Adjustment of the cloud of individuals*: how to find the best approximated image of the cloud. - Find the axis (O,$u_1$) or factor that deforms the cloud as little as possible
$(iH_i)^2$ minimum with $H_i \in \mbox{axe}$ $(OH_i)^2$ maximum (Pythagore)
- Find the best plan that maximizes $\sum_i(OH_i)^2$. The best plan contains the best first axis ($u_1$). We look for u2 such that $u_1\perp u_2$ and maximizes $\sum_i(OH_i)^2$. * *Inertia of the cloud of individuals* $$I = \sum_{i=1}^{n}m_{i} \|X_{i}\|^{2}$$ $m_{i}$ : poids associé à l'individu $i$ * Inertia of the cloud of individuals around $\mathcal{H}$ $$J_{\mathcal{H}} = \sum_{i=1}^{n}m_{i} \|X_{i} - X_{i}^{*}\|^{2}\quad \mbox{measure the deformation of the cloud}$$
$\Longrightarrow$ Il faudra minimiser $J_{\mathcal{H}}$
* Inertia of the projected cloud $$I_{\mathcal{H}} = \sum_{i=1}^{n}m_{i} \|X_{i}^{*}\|^{2}$$
$$I = J_{\mathcal{H}} + I_{\mathcal{H}} \quad \mbox{(pythagore)}$$
**Minimisation** $J_{\mathcal{H}}$ $\Longleftrightarrow$ **Maximiser** $I_{\mathcal{H}}$
%% Cell type:markdown id: tags: #### Determination of $\mathcal{H}_k$ $\mathcal{H}_k$ is an affine **subspace of dimension k** obtained by minimizing the deformation of the cloud by projection.
$$\mathcal{H}_{k} = \min_{\mathcal{H} : dim(\mathcal{H})=k} J_{\mathcal{H}} = \max_{\mathcal{H} : dim(\mathcal{H})=k} I_{\mathcal{H}}$$
The search for $\mathcal{H}_k$ can be done **sequentially** (axis by axis). $$\Gamma=(X)^{t}.M.X\quad \mbox{matrice de variance covariance}$$ $\Gamma$ is symmetrical, semi-definite positive, it is diagonalisable. - $\lambda_{1} \ge \ldots \ge \lambda_{p} \ge 0$: eigenvalue of $\Gamma$ - $u_{1}, \ldots, u_{p}$: eigenvectors of $\Gamma$ * $\mathcal{H}_{1}$ = $(O,u_{1})$ is generated by the first eigenvector of $\Gamma$ * $\mathcal{H}_{2}$ = $(O,u_{1},u_{2})$ * $\mathcal{H}_{k}$ = $(O,u_{1},\ldots,u_{k})$ * k-th eigenvalue of $\Gamma$ associated with k-th eigenvector $u_k$. $$I_{u_{k}} = \lambda_{k}$$ * Inertia of the cloud on $\mathcal{H}_k$ $$I_{\mathcal{H}_{k}} = \sum_{j=1}^{k} \lambda_{j}$$ %% Cell type:markdown id: tags: #### Main axes - Quality of representation * (G,$u_{k}$)} : k th main axis $$I_k=\lambda_k$$ * ACP standardized (sum of variances) $$I=p$$ * ACP not standardized: $$I= \sum_{j=1}^{p} \lambda_{j}$$ * Global quality of representation: **share of inertia explained ** - on k th main axis : $$\frac{\lambda_{k}}{I}$$ - sur $\mathcal{H}_{k}$ : $$I_{\mathcal{H}_{k}} = \frac{\sum_{j=1}^{k} \lambda_{j}}{I}$$ %% Cell type:markdown id: tags: ### Principal Component Analysis and R * The function princomp from R realize a PCA, it remains simplistic. * The FactoMineR package is an R package dedicated to *multivariate exploratory data analysis*. It is developed and maintained by François Husson, Julie Josse, Sébastien Lê, d'Agrocampus Rennes, and J. Mazet. For the Principal Component Analysis, we use the PCA function. [To know more](http://math.agrocampus-ouest.fr/infoglueDeliverLive/membres/Francois.Husson/coinR) %% Cell type:markdown id: tags: ### Analysis of temperature data #### Choice of active and illustrative variables - the active variables are the variables taken into account in the determination of the factorial axes: monthly temperature variables (12 variables). - Quantitative illustrative variables : annual average, thermal amplitude. - Categorical illustrative variables : région. #### Choice of active and illustrative individuals - the active individuals are the capitals of the countries (1:23) to avoid giving more weight to the countries for which several cities are informed. - Illustrative individuals are the cities associated with lines 24:35 of the data table. #### Questions * Can we summarize monthly temperatures by a small number of factors? * What are the biggest disparities between countries? #### Using FactoMineR The function PCA performs principal component analysis with supplementary individuals, supplementary quantitative variables and supplementary categorical variables. Missing values are replaced by the column mean. %% Cell type:code id: tags:  R # Analyse en composantes principales .libPaths("/home/viryl/R/lib") library(FactoMineR) temperat.pca <- PCA(temperat,ind.sup=24:35,quanti.sup=12:16,quali.sup=17) # temperat.PCA : objet de class ''PCA'' et ''list'' # attributes(temperat.pca) # Choisir les axes temperat.pca$eig barplot(temperat.pca$eig[,2]) round(temperat.pca$eig[,2],2)  %% Cell type:markdown id: tags: #### Choose the axes %% Cell type:code id: tags:  R barplot(temperat.pca$eig[,2])  %%%% Output: display_data [Hidden Image Output] %% Cell type:markdown id: tags: #### Visualization of individuals on axes 1 and 2 %% Cell type:code id: tags:  R # Graphiques # Individu sur les axes 1 et 2 - coloriage des individus avec la variable plot(temperat.pca, choix="ind", habillage=17,cex=0.8)  %%%% Output: display_data [Hidden Image Output] %% Cell type:markdown id: tags: #### Visualization of individuals on axes 3 and 4 %% Cell type:code id: tags:  R # Individu sur les axes 3 et 4 plot(temperat.pca, choix="ind", habillage=17,cex=0.8,axes=3:4)  %%%% Output: display_data [Hidden Image Output] %% Cell type:markdown id: tags: #### Use of variables for interpretation A good knowledge of the data can help the interpretation of the projections but when there are many individuals, one will be helped by the variables. We consider the coordinates of the individuals on the axes as variables. $F^{k}_{i}$ (**factor k**) : coordinate of the individual **i** on the **k** axis. $$F^{1}=\{ F^{1}_{i}, i=1 \ldots n\} \, , \, F^{2}=\{ F^{2}_{i}, i=1 \ldots n\}\, ,\,\ldots$$ * Analysis of the correlations of the active variables with the factors When the variables are strongly correlated with the factors : - $cor(X^{k},F^{1}) > O$ : individuals with high values on $X^{k}$ have high values on axis 1. - $cor(X^{k},F^{1}) < O$ : individuals with high values on $X ^ {k}$ have low values on axis 1. * Same for the axis 2. We build ** the correlation circle **. %% Cell type:markdown id: tags: ### The cloud of variables %% Cell type:markdown id: tags: #### Studying variables A ** variable ** is a point on a hypersphere in $\mathcal{R}^{n}$
$$\cos (\theta_{k,l}) = \frac{}{\|X^{k}\| \|X^{l}\|}$$ $$= \frac{\sum_{i=1}^{n} x_{i}^{k} x_{i}^{l}}{\sqrt{\sum_{i=1}^{n} (x_{i}^{k})^{2} \sum_{i=1}^{n} (x_{i}^{l})^{2}}}$$
Since the variables are centered $$\cos (\theta_{k,l}) = r(X^{k},X^{l}) \quad \mbox{correlation coefficient between} \quad X^{k} \; \mbox{et} \quad X^{l}$$ Variables well represented will be close to the circle. **Reduced variables** $\Longrightarrow$ hypersphere is of radius 1 %% Cell type:markdown id: tags: #### Projection of the cloud of variables Which are the axes in $\mathcal{R}^{n}$ that better represent the correlation matrix? * The first axis is the axis that ** maximizes the sum of the squared correlations between the factor and the set of variables**. $$\underset{V_{1} \in \mathcal{R}^{n}}{\operatorname{argmax}} {\sum_{k=1}^{p} r(X^{k},V^{1})^{2}}$$ The factor $V^{1}$ is the factor that is the most linked to the set of variables in terms of squared correlations. * We look for ** a second axis orthogonal to the first ** (uncorrelated) which maximizes the sum of the correlations with the set of variables. * In a sequential way, we determine the 3rd axis, $\ldots$ The projection of the cloud of variables is the same as the representation of the correlation circle obtained previously. %% Cell type:code id: tags:  R # Variables sur les axes 1 et 2 plot(temperat.pca, choix="var",cex=0.8)  %%%% Output: display_data [Hidden Image Output] %% Cell type:markdown id: tags: * All variables have positive coordinates on axis 1 (effect size). * Axis 1 can be summarized as the annual average, which is comforted by the "average" illustratrive variable. * latitude is also linked to the first factor. * the thermal amplitude is linked to the second axis. %% Cell type:markdown id: tags: #### Projection of the variables

* All variables have positive coordinates on axis 1 (effect size).

* We can summarize the axis 1 by the annual average which is comforted by the illustratrive variable "Moyenne".

* The latitude is also linked to the first factor.

* The thermal amplitude is linked to the second axis.

%% Cell type:markdown id: tags: $$r(A,B) = \cos(\theta_{A,B})$$ If ** A ** is close to the plane * **A** is well projected. * $r(A,H_{A}) \approx 1$ * close to the correlation circle. $\Longrightarrow$ **Only well-designed variables can be interpreted** %% Cell type:markdown id: tags: ### Interprétation #### Percentage of inertia - Choice of number of axes * Percentage of information explained by each axis (eigenvalue) * The axes being orthogonal, we can add the explained inertia of several axes. %% Cell type:code id: tags:  R barplot(temperat.pca$eig[,2])  %%%% Output: display_data [Hidden Image Output] %% Cell type:markdown id: tags:$\Longrightarrow$**Allows the choice of the number of axes to analyze** %% Cell type:markdown id: tags: #### Interpretation - 2 indicators Two aids to interpretation: * **Quality of representation ** of variables and individuals on the k-axis. **Variable** ** Individuals**$\cos^{2}(V,V_{k})\cos^{2}(GI,GH_{i}^{k})$*$H_{i}^{k}$is the projection of I on the k axis*.$\Longrightarrow$**Only well-designed elements can be interpreted** %% Cell type:markdown id: tags: * **Contribution à la construction de l'axe k** $$Ctr_k(j) = \frac{r(X^{j},V_{k})^{2}}{\sum_{l=1}^{p} r(X^{l},V_{k})^{2}}$$ $$Ctr_k(i) = \frac{F^{k^2}_{i}}{\sum_{l=1}^{n} F^{k^2}_{l}}$$ %% Cell type:markdown id: tags: #### Eléments supplémentaires ou illustratifs Additional items may be ** individuals ** and / or ** variables **. They are not used to calculate distances between individuals or to construct the correlation matrix.$\Longrightarrow$** they do not participate in the construction of the axes **, they are a help to their interpretation. * Additional variables - * Quantitative variables *: they will be projected on the circle of correlation. The coordinate of the additional variable$X^{j}$on the k-axis is the correlation between this variable and the factor$F^{k}$. - * Qualitative variables *: projection of each modality ** to the barycenter of the individuals associated with this modality **, on ** the graph of the individuals ** The information can be * represented in the form of a color code *, individuals associated with the same category are colored in the same color. * Additional individuals : they are projected on the graph of the individuals. %% Cell type:code id: tags:  R # Individu sur les axes 3 et 4 plot(temperat.pca, choix="ind", habillage=17,cex=0.8,axes=3:4)  %%%% Output: display_data [Hidden Image Output] %% Cell type:code id: tags:  R # Variables sur les axes 1 et 2 plot(temperat.pca, choix="var",cex=0.8)  %%%% Output: display_data [Hidden Image Output] %% Cell type:markdown id: tags: #### Automatic description of axes This type of interpretation help is interesting when ** the number of variables is important **. * Quantitatives variables :$(r(V^{j},F^{k}), j=1 \ldots p)$- Variables that have a correlation coefficient with the factor are kept significantly$\# 0$. - For each axis, we sort the variables of the highest correlation coefficient at least high. * Qualitative variable : an analysis of variance is performed for each qualitative variable and each factor (Fisher test, Student test). %% Cell type:code id: tags:  R dimdesc(temperat.pca)  %%%% Output: display_data$Dim.1 : $quanti : | | correlation | p.value | |---|---|---|---|---|---|---|---|---|---|---|---|---|---| | Septembre | 0.9924932 | 1.187361e-20 | | Moyenne | 0.9922340 | 1.693870e-20 | | Octobre | 0.9852237 | 1.409265e-17 | | Avril | 0.9794972 | 4.282880e-16 | | Novembre | 0.9344977 | 6.950867e-11 | | Mars | 0.9294103 | 1.490263e-10 | | Aout | 0.9293171 | 1.510426e-10 | | Mai | 0.8939479 | 9.120840e-09 | | Juillet | 0.8693439 | 7.288714e-08 | | Juin | 0.8612611 | 1.318936e-07 | | Fevrier | 0.8587611 | 1.572744e-07 | | Décembre | 0.8448172 | 3.962517e-07 | | Janvier | 0.8117914 | 2.574863e-06 | | Latitude | -0.8695115 | 7.196646e-08 |$quali : | | R2 | p.value | |---| | Région | 0.6889198 | 4.659316e-05 | $category : | | Estimate | p.value | |---|---| | Sud | 4.045232 | 2.196202e-05 | | Nord | -2.840457 | 7.568562e-03 |$Dim.2 : $quanti : | | correlation | p.value | |---|---|---|---|---|---|---| | Janvier | 0.5734912 | 4.223923e-03 | | Décembre | 0.5140212 | 1.210370e-02 | | Fevrier | 0.5044054 | 1.411137e-02 | | Longitude | -0.4359624 | 3.756716e-02 | | Juillet | -0.4663341 | 2.489752e-02 | | Juin | -0.5006543 | 1.496483e-02 | | Amplitude | -0.9604872 | 3.865197e-13 |$quali : | | R2 | p.value | |---| | Région | 0.5144575 | 0.002825625 | $category : | | Estimate | p.value | |---|---| | Nord | 0.7521184 | 0.0370465132 | | Est | -1.3636282 | 0.0004962863 |$Dim.3 : $quali : | | R2 | p.value | |---| | Région | 0.4182892 | 0.0144468 |$category : | | Estimate | p.value | |---|---| | Nord | 0.2713414 | 0.01390328 | | Est | -0.2481828 | 0.01634125 | \begin{description} \item[\$Dim.1] \begin{description} \item[\$quanti] \begin{tabular}{r|ll} & correlation & p.value\\ \hline Septembre & 0.9924932 & 1.187361e-20\\ Moyenne & 0.9922340 & 1.693870e-20\\ Octobre & 0.9852237 & 1.409265e-17\\ Avril & 0.9794972 & 4.282880e-16\\ Novembre & 0.9344977 & 6.950867e-11\\ Mars & 0.9294103 & 1.490263e-10\\ Aout & 0.9293171 & 1.510426e-10\\ Mai & 0.8939479 & 9.120840e-09\\ Juillet & 0.8693439 & 7.288714e-08\\ Juin & 0.8612611 & 1.318936e-07\\ Fevrier & 0.8587611 & 1.572744e-07\\ Décembre & 0.8448172 & 3.962517e-07\\ Janvier & 0.8117914 & 2.574863e-06\\ Latitude & -0.8695115 & 7.196646e-08\\ \end{tabular} \item[\$quali] \begin{tabular}{r|ll} & R2 & p.value\\ \hline Région & 0.6889198 & 4.659316e-05\\ \end{tabular} \item[\$category] \begin{tabular}{r|ll} & Estimate & p.value\\ \hline Sud & 4.045232 & 2.196202e-05\\ Nord & -2.840457 & 7.568562e-03\\ \end{tabular} \end{description} \item[\$Dim.2] \begin{description} \item[\$quanti] \begin{tabular}{r|ll} & correlation & p.value\\ \hline Janvier & 0.5734912 & 4.223923e-03\\ Décembre & 0.5140212 & 1.210370e-02\\ Fevrier & 0.5044054 & 1.411137e-02\\ Longitude & -0.4359624 & 3.756716e-02\\ Juillet & -0.4663341 & 2.489752e-02\\ Juin & -0.5006543 & 1.496483e-02\\ Amplitude & -0.9604872 & 3.865197e-13\\ \end{tabular} \item[\$quali] \begin{tabular}{r|ll} & R2 & p.value\\ \hline Région & 0.5144575 & 0.002825625\\ \end{tabular} \item[\$category] \begin{tabular}{r|ll} & Estimate & p.value\\ \hline Nord & 0.7521184 & 0.0370465132\\ Est & -1.3636282 & 0.0004962863\\ \end{tabular} \end{description} \item[\$Dim.3] \begin{description} \item[\$quali] \begin{tabular}{r|ll} & R2 & p.value\\ \hline Région & 0.4182892 & 0.0144468\\ \end{tabular} \item[\\$category] \begin{tabular}{r|ll} & Estimate & p.value\\ \hline Nord & 0.2713414 & 0.01390328\\ Est & -0.2481828 & 0.01634125\\ \end{tabular} \end{description} \end{description} %% Cell type:markdown id: tags: ## Factoshiny: interactive graphs in exploratory multivariate data analysis The [Factoshiny](http://factominer.free.fr/graphs/factoshiny.html) package allows you to use the [FactoMineR](http://factominer.free.fr) package using a graphical interface, and also allows you to modify the graphics interactively. This package is very useful to optimize these graphics before distributing them. %% Cell type:markdown id: tags: ## Management of missing data in ACP ## Handling of missing data in ACP [PCA with missing data usinf MissMDA R Package](https://www.youtube.com/watch?v=OOM8_FH6_8o) [Methodology on the treatment of missing data](https://www.youtube.com/watch?v=hQ6tDtgotx0) %% Cell type:markdown id: tags: # Autres méthodes d'analyse multidimentionnelles ## Analyse factorielle des correspondances ### Données et objectifs ## Analyse des correspondances multiples ### Données et objectifs # Other methods of multidimensional analysis [Approach in multidimensional data analysis](http://math.agrocampus-ouest.fr/infoglueDeliverLive/membres/Francois.Husson/Rcorner) ## Classification ### Données et objectifs [Approach in multidimensional data analysis](http://math.agrocampus-ouest.fr/infoglueDeliverLive/membres/Francois.Husson/Rcorner) ## Correspondence Analysis ### Data et objectifs The main point of **correspondence analysis** is studying the **links between pairs of qualitative variables**. This really means looking at the difference between the given data, and what it would be like if the variables were independent. We're therefore going to see how the analysis captures deviation from independence. Our reasoning will mainly be geometrical, creating point clouds for the rows and point clouds for the columns. Projecting these clouds onto planes will give some useful representations. %% Cell type:markdown id: tags: ## Multiple Correspondence Analysis ### Data et objectifs In the MCA context, we have a point cloud of individuals, and a point clouds of categories. We see how to visualize the point cloud of individuals, and how to interpret it using the categories and how to directly visualize the point cloud of categories. The point cloud of individuals, and that of the categories, can be shown simultaneously on the same graph. This is called the simultaneous representation of the point clouds. ## Analyse Factorielle Multiple ### Données et objectifs ## Multiple Factor Analysis ### Data et objectifs Method to study more complex data tables, where a group of individuals is characterized by variables structured as groups, and possibly coming from different information sources. The interest in the method is due to it being able to analyze a data table as a whole, but also its ability to compare information provided by the various information sources. [MOOC AgroCampus Ouest Exploratory Multivariate Data Analysis](https://www.fun-mooc.fr/courses/course-v1:agrocampusouest+40001S04EN+session04/info) [Courses AgroCampus Ouest F. Husson](http://math.agrocampus-ouest.fr/infoglueDeliverLive/membres/Francois.Husson/teaching) %% Cell type:markdown id: tags: # Quelques références * Analyse de données avec R, 2ème édition revue et augmentée.
F. Husson, S. Lê & J. Pagès (2016).
Presses Universitaires de Rennes * Statistique avec R, 3ème edition revue et augmentée.
P-A. Cornillon, A. Guyader, F. Husson, N. Jégou, J. Josse, M. Kloareg, E. Matzner-Lober, L. Rouvière (2012).
Presses Universitaires de Rennes * Exploratory Multivariate Analysis by Example Using R.
F. Husson, S. Lê & J. Pagès. 2nd edition (2017).
Chapman & Hall/CRC Computer Science & Data Analysis. * MOOC sur FUN %% Cell type:markdown id: tags: ... ...
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!