Newer
Older
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img width=\"800px\" src=\"../fidle/img/header.svg\"></img>\n",
"\n",
"# <!-- TITLE --> [PER57] - Perceptron Model 1957\n",
"<!-- DESC --> Example of use of a Perceptron, with sklearn and IRIS dataset of 1936 !\n",
"<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->\n",
"\n",
"## Objectives :\n",
" - Implement a historical linear classifier with a historical dataset !\n",
" - The objective is to predict the type of Iris from the size of the leaves.\n",
" - Identifying its limitations \n",
"\n",
"The [IRIS dataset](https://archive.ics.uci.edu/ml/datasets/Iris) is probably one of the oldest datasets, dating back to 1936 .\n",
"\n",
"## What we're going to do :\n",
" - Retrieve the dataset, via scikit learn\n",
" - training and classifying\n",
"\n",
"## Step 1 - Import and init"
]
},
{
"cell_type": "code",
"source": [
"import numpy as np\n",
"from sklearn.datasets import load_iris\n",
"from sklearn.linear_model import Perceptron\n",
"import matplotlib.pyplot as plt\n",
"import matplotlib\n",
"\n",
"import os,sys\n",
"\n",
"# Init Fidle environment\n",
"run_id, run_dir, datasets_dir = fidle.init('PER57')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2 - Prepare IRIS Dataset\n",
"\n",
"Retrieve a dataset : http://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets \n",
"About the datesets : https://scikit-learn.org/stable/datasets.html#datasets \n",
"\n",
"Data fields (X) :\n",
"- 0 : sepal length in cm\n",
"- 1 : sepal width in cm\n",
"- 2 : petal length in cm\n",
"- 3 : petal width in cm \n",
"\n",
"Class (y) :\n",
"- 0 : class 0=Iris-Setosa, 1=Iris-Versicolour, 2=Iris-Virginica\n",
"\n",
"### 2.1 - Get dataset"
]
},
{
"cell_type": "code",
"x0,y0 = load_iris(return_X_y=True)\n",
"x = x0[:, (2,3)] # We only keep fields 2 and 3\n",
"y = y0.copy()\n",
"\n",
"y[ y0==0 ] = 1 # 1 = Iris setosa\n",
"y[ y0>=1 ] = 0 # 0 = not iris setosa\n",
"\n",
"df=pd.DataFrame.from_dict({'Length (x1)':x[:,0], 'Width (x2)':x[:,1], 'Setosa {0,1} (y)':y})\n",
"display(df)\n",
"\n",
"print(f'x shape : {x.shape}')\n",
"print(f'y shape : {y.shape}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.2 - Train and test sets"
]
},
{
"cell_type": "code",
"x,y = fidle.utils.shuffle_np_dataset(x, y)\n",
" \n",
"n=int(len(x)*0.8)\n",
"x_train = x[:n]\n",
"y_train = y[:n]\n",
"x_test = x[n:]\n",
"y_test = y[n:]\n",
"\n",
"print(f'x_train shape : {x_train.shape}')\n",
"print(f'y_train shape : {y_train.shape}')\n",
"print(f'x_test shape : {x_test.shape}')\n",
"print(f'y_test shape : {y_test.shape}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3 - Get a perceptron, and train it"
]
},
{
"cell_type": "code",
"pct = Perceptron(max_iter=100, random_state=82, tol=0.01, verbose=1)\n",
"pct.fit(x_train, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4 - Prédictions"
]
},
{
"cell_type": "code",
"y_pred = pct.predict(x_test) \n",
"df=pd.DataFrame.from_dict({'Length (x1)':x_test[:,0], 'Width (x2)':x_test[:,1], 'y_test':y_test, 'y_pred':y_pred})\n",
"display(df[:15])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5 - Visualisation"
]
},
{
"cell_type": "code",
"def plot_perceptron(x_train,y_train,x_test,y_test):\n",
" a = -pct.coef_[0][0] / pct.coef_[0][1]\n",
" b = -pct.intercept_ / pct.coef_[0][1]\n",
" box=[x.min(axis=0)[0],x.max(axis=0)[0],x.min(axis=0)[1],x.max(axis=0)[1]]\n",
" mx=(box[1]-box[0])/20\n",
" my=(box[3]-box[2])/20\n",
" box=[box[0]-mx,box[1]+mx,box[2]-my,box[3]+my]\n",
" fig, axs = plt.subplots(1, 1)\n",
" fig.set_size_inches(10,6)\n",
" \n",
" axs.plot(x_train[y_train==1, 0], x_train[y_train==1, 1], \"o\", color='tomato', label=\"Iris-Setosa\")\n",
" axs.plot(x_train[y_train==0, 0], x_train[y_train==0, 1], \"o\", color='steelblue',label=\"Autres\")\n",
" \n",
" axs.plot(x_test[y_pred==1, 0], x_test[y_pred==1, 1], \"o\", color='lightsalmon', label=\"Iris-Setosa (pred)\")\n",
" axs.plot(x_test[y_pred==0, 0], x_test[y_pred==0, 1], \"o\", color='lightblue', label=\"Autres (pred)\")\n",
" \n",
" axs.plot([box[0], box[1]], [a*box[0]+b, a*box[1]+b], \"k--\", linewidth=2)\n",
" axs.set_xlabel(\"Petal length (cm)\", labelpad=15) #, fontsize=14)\n",
" axs.set_ylabel(\"Petal width (cm)\", labelpad=15) #, fontsize=14)\n",
" axs.legend(loc=\"lower right\", fontsize=14)\n",
" axs.set_xlim(box[0],box[1])\n",
" axs.set_ylim(box[2],box[3])\n",
" fidle.scrawler.save_fig('01-perceptron-iris')\n",
" plt.show()\n",
" \n",
"plot_perceptron(x_train,y_train, x_test,y_test)"
{
"cell_type": "code",
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"<img width=\"80px\" src=\"../fidle/img/logo-paysage.svg\"></img>"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.2 ('fidle-env')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.2"
},
"vscode": {
"interpreter": {
"hash": "b3929042cc22c1274d74e3e946c52b845b57cb6d84f2d591ffe0519b38e4896d"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}