{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "<img width=\"800px\" src=\"../fidle/img/header.svg\"></img>\n", "\n", "# <!-- TITLE --> [PER57] - Perceptron Model 1957\n", "<!-- DESC --> Example of use of a Perceptron, with sklearn and IRIS dataset of 1936 !\n", "<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->\n", "\n", "## Objectives :\n", " - Implement a historical linear classifier with a historical dataset !\n", " - The objective is to predict the type of Iris from the size of the leaves.\n", " - Identifying its limitations \n", "\n", "The [IRIS dataset](https://archive.ics.uci.edu/ml/datasets/Iris) is probably one of the oldest datasets, dating back to 1936 .\n", "\n", "## What we're going to do :\n", " - Retrieve the dataset, via scikit learn\n", " - training and classifying\n", "\n", "## Step 1 - Import and init" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from sklearn.datasets import load_iris\n", "from sklearn.linear_model import Perceptron\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import matplotlib\n", "\n", "import os,sys\n", "\n", "import fidle\n", "\n", "# Init Fidle environment\n", "run_id, run_dir, datasets_dir = fidle.init('PER57')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2 - Prepare IRIS Dataset\n", "\n", "Retrieve a dataset : http://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets \n", "About the datesets : https://scikit-learn.org/stable/datasets.html#datasets \n", "\n", "Data fields (X) :\n", "- 0 : sepal length in cm\n", "- 1 : sepal width in cm\n", "- 2 : petal length in cm\n", "- 3 : petal width in cm \n", "\n", "Class (y) :\n", "- 0 : class 0=Iris-Setosa, 1=Iris-Versicolour, 2=Iris-Virginica\n", "\n", "### 2.1 - Get dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x0,y0 = load_iris(return_X_y=True)\n", "\n", "x = x0[:, (2,3)] # We only keep fields 2 and 3\n", "y = y0.copy()\n", "\n", "y[ y0==0 ] = 1 # 1 = Iris setosa\n", "y[ y0>=1 ] = 0 # 0 = not iris setosa\n", "\n", "df=pd.DataFrame.from_dict({'Length (x1)':x[:,0], 'Width (x2)':x[:,1], 'Setosa {0,1} (y)':y})\n", "display(df)\n", "\n", "print(f'x shape : {x.shape}')\n", "print(f'y shape : {y.shape}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2 - Train and test sets" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x,y = fidle.utils.shuffle_np_dataset(x, y)\n", " \n", "n=int(len(x)*0.8)\n", "x_train = x[:n]\n", "y_train = y[:n]\n", "x_test = x[n:]\n", "y_test = y[n:]\n", "\n", "print(f'x_train shape : {x_train.shape}')\n", "print(f'y_train shape : {y_train.shape}')\n", "print(f'x_test shape : {x_test.shape}')\n", "print(f'y_test shape : {y_test.shape}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3 - Get a perceptron, and train it" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pct = Perceptron(max_iter=100, random_state=82, tol=0.01, verbose=1)\n", "pct.fit(x_train, y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4 - Prédictions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_pred = pct.predict(x_test) \n", "\n", "df=pd.DataFrame.from_dict({'Length (x1)':x_test[:,0], 'Width (x2)':x_test[:,1], 'y_test':y_test, 'y_pred':y_pred})\n", "display(df[:15])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5 - Visualisation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def plot_perceptron(x_train,y_train,x_test,y_test):\n", " a = -pct.coef_[0][0] / pct.coef_[0][1]\n", " b = -pct.intercept_ / pct.coef_[0][1]\n", " box=[x.min(axis=0)[0],x.max(axis=0)[0],x.min(axis=0)[1],x.max(axis=0)[1]]\n", " mx=(box[1]-box[0])/20\n", " my=(box[3]-box[2])/20\n", " box=[box[0]-mx,box[1]+mx,box[2]-my,box[3]+my]\n", "\n", " fig, axs = plt.subplots(1, 1)\n", " fig.set_size_inches(10,6)\n", " \n", " axs.plot(x_train[y_train==1, 0], x_train[y_train==1, 1], \"o\", color='tomato', label=\"Iris-Setosa\")\n", " axs.plot(x_train[y_train==0, 0], x_train[y_train==0, 1], \"o\", color='steelblue',label=\"Autres\")\n", " \n", " axs.plot(x_test[y_pred==1, 0], x_test[y_pred==1, 1], \"o\", color='lightsalmon', label=\"Iris-Setosa (pred)\")\n", " axs.plot(x_test[y_pred==0, 0], x_test[y_pred==0, 1], \"o\", color='lightblue', label=\"Autres (pred)\")\n", " \n", " axs.plot([box[0], box[1]], [a*box[0]+b, a*box[1]+b], \"k--\", linewidth=2)\n", " axs.set_xlabel(\"Petal length (cm)\", labelpad=15) #, fontsize=14)\n", " axs.set_ylabel(\"Petal width (cm)\", labelpad=15) #, fontsize=14)\n", " axs.legend(loc=\"lower right\", fontsize=14)\n", " axs.set_xlim(box[0],box[1])\n", " axs.set_ylim(box[2],box[3])\n", " fidle.scrawler.save_fig('01-perceptron-iris')\n", " plt.show()\n", " \n", "plot_perceptron(x_train,y_train, x_test,y_test)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fidle.end()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "<img width=\"80px\" src=\"../fidle/img/logo-paysage.svg\"></img>" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.9.2 ('fidle-env')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" }, "vscode": { "interpreter": { "hash": "b3929042cc22c1274d74e3e946c52b845b57cb6d84f2d591ffe0519b38e4896d" } } }, "nbformat": 4, "nbformat_minor": 4 }