Commit 92336b8b authored by Olivier Michel's avatar Olivier Michel 💬
Browse files

Replace N2_KMeans_iris_data_example.ipynb

parent 899d5226
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
This notebook can be run on mybinder: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Fgricad-gitlab.univ-grenoble-alpes.fr%2Fchatelaf%2Fml-sicom3a/master?urlpath=lab/tree/notebooks/7_Clusturing/N2_Kmeans_iris_data_example/) This notebook can be run on mybinder: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Fgricad-gitlab.univ-grenoble-alpes.fr%2Fchatelaf%2Fml-sicom3a/master?urlpath=lab/tree/notebooks/7_Clusturing/N2_Kmeans_iris_data_example/)
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Iris data : KMEANS # Iris data : KMEANS
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import numpy as np import numpy as np
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
from sklearn import datasets from sklearn import datasets
from sklearn.cluster import KMeans from sklearn.cluster import KMeans
iris = datasets.load_iris() iris = datasets.load_iris()
x=iris.data x=iris.data
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
wcss = [] wcss = []
for i in range(1, 11): for i in range(1, 11):
kmeans = KMeans(n_clusters = i, init = 'k-means++', max_iter = 300, kmeans = KMeans(n_clusters = i, init = 'k-means++', max_iter = 300,
n_init = 10, random_state = 0) n_init = 10, random_state = 0)
kmeans.fit(x) kmeans.fit(x)
wcss.append(kmeans.inertia_) wcss.append(kmeans.inertia_)
#Plotting the results onto a line graph, allowing us to observe 'The elbow' #Plotting the results onto a line graph, allowing us to observe 'The elbow'
plt.plot(range(1, 11), wcss) plt.plot(range(1, 11), wcss)
plt.title('The elbow method') plt.title('The elbow method')
plt.xlabel('Number of clusters') plt.xlabel('Number of clusters')
plt.ylabel('WCSS') #within cluster sum of squares plt.ylabel('WCSS') #within cluster sum of squares
plt.show() plt.show()
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Exercize ## Exercize 3
- Comment the choice of Kmeans input parameters used above - Comment the choice of Kmeans input parameters used above
- 'The elbow method' from the above graph : find the optimum number of clusters by observing the within cluster sum of squares (WCSS). Explain the shape of the curve WCSS=f(nb of clusters) - 'The elbow method' from the above graph : find the optimum number of clusters by observing the within cluster sum of squares (WCSS). Explain the shape of the curve WCSS=f(nb of clusters)
- What is the asymptotic value of WCSS when the. umber of clusters approaches N (nb of points)? - What is the asymptotic value of WCSS when the. umber of clusters approaches N (nb of points)?
- Explain why the curve doesn't decrease significantly with every iteration - Explain why the curve doesn't decrease significantly with every iteration
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#Applying kmeans to the dataset / Creating the kmeans classifier #Applying kmeans to the dataset / Creating the kmeans classifier
NbClust= 3 NbClust= 3
kmeans = KMeans(n_clusters = NbClust, init = 'k-means++', max_iter = 300, n_init = 10, random_state = 0) kmeans = KMeans(n_clusters = NbClust, init = 'k-means++', max_iter = 300, n_init = 10, random_state = 0)
y_kmeans = kmeans.fit_predict(x) y_kmeans = kmeans.fit_predict(x)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#Visualising the 3 first clusters wrt x1 X2 #Visualising the 3 first clusters wrt x1 X2
plt.scatter(x[y_kmeans == 0, 0], x[y_kmeans == 0, 1], s = 25, c = 'red', label = 'C0') plt.scatter(x[y_kmeans == 0, 0], x[y_kmeans == 0, 1], s = 25, c = 'red', label = 'C0')
plt.scatter(x[y_kmeans == 1, 0], x[y_kmeans == 1, 1], s = 25, c = 'blue', label = 'C1') plt.scatter(x[y_kmeans == 1, 0], x[y_kmeans == 1, 1], s = 25, c = 'blue', label = 'C1')
plt.scatter(x[y_kmeans == 2, 0], x[y_kmeans == 2, 1], s = 25, c = 'green', label = 'C2') plt.scatter(x[y_kmeans == 2, 0], x[y_kmeans == 2, 1], s = 25, c = 'green', label = 'C2')
#Plotting the centroids of the clusters #Plotting the centroids of the clusters
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,1], plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,1],
s =25, c = 'yellow', label = 'Centroids') s =25, c = 'yellow', label = 'Centroids')
plt.legend() plt.legend()
``` ```
%% Output %% Output
<matplotlib.legend.Legend at 0x7fcc29ec9310> <matplotlib.legend.Legend at 0x7fcc29ec9310>
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Exercize ## Exercize 4
- remind the reasons why the clusters formed by KMeans algorithm are are included in Voronoï cells associated to the centroïds - remind the reasons why the clusters formed by KMeans algorithm are are included in Voronoï cells associated to the centroïds
- Comment the shape of the obtained clusters represented in the figure above - Comment the shape of the obtained clusters represented in the figure above
- How would you check that enough iterations were performed? - How would you check that enough iterations were performed?
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
#Visualising the clusters x1, x3 #Visualising the clusters x1, x3
plt.scatter(x[y_kmeans == 0, 0], x[y_kmeans == 0, 2], s = 25, c = 'red', label = 'Iris-setosa') plt.scatter(x[y_kmeans == 0, 0], x[y_kmeans == 0, 2], s = 25, c = 'red', label = 'Iris-setosa')
plt.scatter(x[y_kmeans == 1, 0], x[y_kmeans == 1, 2], s = 25, c = 'blue', label = 'Iris-versicolour') plt.scatter(x[y_kmeans == 1, 0], x[y_kmeans == 1, 2], s = 25, c = 'blue', label = 'Iris-versicolour')
plt.scatter(x[y_kmeans == 2, 0], x[y_kmeans == 2, 2], s = 25, c = 'green', label = 'Iris-virginica') plt.scatter(x[y_kmeans == 2, 0], x[y_kmeans == 2, 2], s = 25, c = 'green', label = 'Iris-virginica')
#Plotting the centroids of the clusters #Plotting the centroids of the clusters
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,2], s = 25, c = 'yellow', label = 'Centroids') plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,2], s = 25, c = 'yellow', label = 'Centroids')
plt.legend(); plt.legend();
``` ```
%% Output %% Output
<matplotlib.legend.Legend at 0x1a1526d320> <matplotlib.legend.Legend at 0x1a1526d320>
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Exercize ## Exercize 5
- Propose a measure of the goodness of clustering, associated to this problem (implementation is not required). - Propose a measure of the goodness of clustering, associated to this problem (implementation is not required).
- How could the cost-complexity tradeoff be tackled? - How could the cost-complexity tradeoff be tackled?
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
``` ```
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment