Commit 92336b8b authored by Olivier Michel's avatar Olivier Michel 💬
Browse files

Replace N2_KMeans_iris_data_example.ipynb

parent 899d5226
%% Cell type:markdown id: tags:
This notebook can be run on mybinder: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Fgricad-gitlab.univ-grenoble-alpes.fr%2Fchatelaf%2Fml-sicom3a/master?urlpath=lab/tree/notebooks/7_Clusturing/N2_Kmeans_iris_data_example/)
%% Cell type:markdown id: tags:
# Iris data : KMEANS
%% Cell type:code id: tags:
``` python
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.cluster import KMeans
iris = datasets.load_iris()
x=iris.data
```
%% Cell type:code id: tags:
``` python
wcss = []
for i in range(1, 11):
kmeans = KMeans(n_clusters = i, init = 'k-means++', max_iter = 300,
n_init = 10, random_state = 0)
kmeans.fit(x)
wcss.append(kmeans.inertia_)
#Plotting the results onto a line graph, allowing us to observe 'The elbow'
plt.plot(range(1, 11), wcss)
plt.title('The elbow method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS') #within cluster sum of squares
plt.show()
```
%% Output
%% Cell type:markdown id: tags:
## Exercize
## Exercize 3
- Comment the choice of Kmeans input parameters used above
- 'The elbow method' from the above graph : find the optimum number of clusters by observing the within cluster sum of squares (WCSS). Explain the shape of the curve WCSS=f(nb of clusters)
- What is the asymptotic value of WCSS when the. umber of clusters approaches N (nb of points)?
- Explain why the curve doesn't decrease significantly with every iteration
%% Cell type:code id: tags:
``` python
#Applying kmeans to the dataset / Creating the kmeans classifier
NbClust= 3
kmeans = KMeans(n_clusters = NbClust, init = 'k-means++', max_iter = 300, n_init = 10, random_state = 0)
y_kmeans = kmeans.fit_predict(x)
```
%% Cell type:code id: tags:
``` python
#Visualising the 3 first clusters wrt x1 X2
plt.scatter(x[y_kmeans == 0, 0], x[y_kmeans == 0, 1], s = 25, c = 'red', label = 'C0')
plt.scatter(x[y_kmeans == 1, 0], x[y_kmeans == 1, 1], s = 25, c = 'blue', label = 'C1')
plt.scatter(x[y_kmeans == 2, 0], x[y_kmeans == 2, 1], s = 25, c = 'green', label = 'C2')
#Plotting the centroids of the clusters
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,1],
s =25, c = 'yellow', label = 'Centroids')
plt.legend()
```
%% Output
<matplotlib.legend.Legend at 0x7fcc29ec9310>
%% Cell type:markdown id: tags:
## Exercize
## Exercize 4
- remind the reasons why the clusters formed by KMeans algorithm are are included in Voronoï cells associated to the centroïds
- Comment the shape of the obtained clusters represented in the figure above
- How would you check that enough iterations were performed?
%% Cell type:code id: tags:
``` python
#Visualising the clusters x1, x3
plt.scatter(x[y_kmeans == 0, 0], x[y_kmeans == 0, 2], s = 25, c = 'red', label = 'Iris-setosa')
plt.scatter(x[y_kmeans == 1, 0], x[y_kmeans == 1, 2], s = 25, c = 'blue', label = 'Iris-versicolour')
plt.scatter(x[y_kmeans == 2, 0], x[y_kmeans == 2, 2], s = 25, c = 'green', label = 'Iris-virginica')
#Plotting the centroids of the clusters
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,2], s = 25, c = 'yellow', label = 'Centroids')
plt.legend();
```
%% Output
<matplotlib.legend.Legend at 0x1a1526d320>
%% Cell type:markdown id: tags:
## Exercize
## Exercize 5
- Propose a measure of the goodness of clustering, associated to this problem (implementation is not required).
- How could the cost-complexity tradeoff be tackled?
%% Cell type:code id: tags:
``` python
```
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment