### Replace N2_KMeans_iris_data_example.ipynb

parent 899d5226
 ... ... @@ -69,7 +69,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ "## Exercize \n", "## Exercize 3\n", "- Comment the choice of Kmeans input parameters used above\n", "- 'The elbow method' from the above graph : find the optimum number of clusters by observing the within cluster sum of squares (WCSS). Explain the shape of the curve WCSS=f(nb of clusters)\n", "- What is the asymptotic value of WCSS when the. umber of clusters approaches N (nb of points)? \n", ... ... @@ -133,7 +133,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ "## Exercize\n", "## Exercize 4\n", "- remind the reasons why the clusters formed by KMeans algorithm are are included in Voronoï cells associated to the centroïds\n", "- Comment the shape of the obtained clusters represented in the figure above\n", "- How would you check that enough iterations were performed? " ... ... @@ -183,7 +183,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ "## Exercize\n", "## Exercize 5\n", "- Propose a measure of the goodness of clustering, associated to this problem (implementation is not required).\n", "- How could the cost-complexity tradeoff be tackled? " ] ... ... @@ -212,7 +212,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" "version": "3.8.2" } }, "nbformat": 4, ... ...
 %% Cell type:markdown id: tags: This notebook can be run on mybinder: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Fgricad-gitlab.univ-grenoble-alpes.fr%2Fchatelaf%2Fml-sicom3a/master?urlpath=lab/tree/notebooks/7_Clusturing/N2_Kmeans_iris_data_example/) %% Cell type:markdown id: tags: # Iris data : KMEANS %% Cell type:code id: tags: ``` python import numpy as np import matplotlib.pyplot as plt from sklearn import datasets from sklearn.cluster import KMeans iris = datasets.load_iris() x=iris.data ``` %% Cell type:code id: tags: ``` python wcss = [] for i in range(1, 11): kmeans = KMeans(n_clusters = i, init = 'k-means++', max_iter = 300, n_init = 10, random_state = 0) kmeans.fit(x) wcss.append(kmeans.inertia_) #Plotting the results onto a line graph, allowing us to observe 'The elbow' plt.plot(range(1, 11), wcss) plt.title('The elbow method') plt.xlabel('Number of clusters') plt.ylabel('WCSS') #within cluster sum of squares plt.show() ``` %%%% Output: display_data [Hidden Image Output] %% Cell type:markdown id: tags: ## Exercize ## Exercize 3 - Comment the choice of Kmeans input parameters used above - 'The elbow method' from the above graph : find the optimum number of clusters by observing the within cluster sum of squares (WCSS). Explain the shape of the curve WCSS=f(nb of clusters) - What is the asymptotic value of WCSS when the. umber of clusters approaches N (nb of points)? - Explain why the curve doesn't decrease significantly with every iteration %% Cell type:code id: tags: ``` python #Applying kmeans to the dataset / Creating the kmeans classifier NbClust= 3 kmeans = KMeans(n_clusters = NbClust, init = 'k-means++', max_iter = 300, n_init = 10, random_state = 0) y_kmeans = kmeans.fit_predict(x) ``` %% Cell type:code id: tags: ``` python #Visualising the 3 first clusters wrt x1 X2 plt.scatter(x[y_kmeans == 0, 0], x[y_kmeans == 0, 1], s = 25, c = 'red', label = 'C0') plt.scatter(x[y_kmeans == 1, 0], x[y_kmeans == 1, 1], s = 25, c = 'blue', label = 'C1') plt.scatter(x[y_kmeans == 2, 0], x[y_kmeans == 2, 1], s = 25, c = 'green', label = 'C2') #Plotting the centroids of the clusters plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,1], s =25, c = 'yellow', label = 'Centroids') plt.legend() ``` %%%% Output: execute_result %%%% Output: display_data [Hidden Image Output] %% Cell type:markdown id: tags: ## Exercize ## Exercize 4 - remind the reasons why the clusters formed by KMeans algorithm are are included in Voronoï cells associated to the centroïds - Comment the shape of the obtained clusters represented in the figure above - How would you check that enough iterations were performed? %% Cell type:code id: tags: ``` python #Visualising the clusters x1, x3 plt.scatter(x[y_kmeans == 0, 0], x[y_kmeans == 0, 2], s = 25, c = 'red', label = 'Iris-setosa') plt.scatter(x[y_kmeans == 1, 0], x[y_kmeans == 1, 2], s = 25, c = 'blue', label = 'Iris-versicolour') plt.scatter(x[y_kmeans == 2, 0], x[y_kmeans == 2, 2], s = 25, c = 'green', label = 'Iris-virginica') #Plotting the centroids of the clusters plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,2], s = 25, c = 'yellow', label = 'Centroids') plt.legend(); ``` %%%% Output: execute_result %%%% Output: display_data [Hidden Image Output] %% Cell type:markdown id: tags: ## Exercize ## Exercize 5 - Propose a measure of the goodness of clustering, associated to this problem (implementation is not required). - How could the cost-complexity tradeoff be tackled? %% Cell type:code id: tags: ``` python ``` ... ...
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!