01-Simple-Perceptron.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img width=\"800px\" src=\"../fidle/img/header.svg\"></img>\n",
    "\n",
    "# <!-- TITLE --> [PER57] - Perceptron Model 1957\n",
    "<!-- DESC --> Example of use of a Perceptron, with sklearn and IRIS dataset of 1936 !\n",
    "<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->\n",
    "\n",
    "## Objectives :\n",
    " - Implement a historical linear classifier with a historical dataset !\n",
    " - The objective is to predict the type of Iris from the size of the leaves.\n",
    " - Identifying its limitations  \n",
    "\n",
    "The [IRIS dataset](https://archive.ics.uci.edu/ml/datasets/Iris) is probably one of the oldest datasets, dating back to 1936 .\n",
    "\n",
    "## What we're going to do :\n",
    " - Retrieve the dataset, via scikit learn\n",
    " - training and classifying\n",
    "\n",
    "## Step 1 - Import and init"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn.datasets     import load_iris\n",
    "from sklearn.linear_model import Perceptron\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib\n",
    "\n",
    "import os,sys\n",
    "\n",
    "import fidle\n",
    "\n",
    "# Init Fidle environment\n",
    "run_id, run_dir, datasets_dir = fidle.init('PER57')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2 - Prepare IRIS Dataset\n",
    "\n",
    "Retrieve a dataset : http://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets  \n",
    "About the datesets : https://scikit-learn.org/stable/datasets.html#datasets  \n",
    "\n",
    "Data fields (X) :\n",
    "- 0 : sepal length in cm\n",
    "- 1 : sepal width in cm\n",
    "- 2 : petal length in cm\n",
    "- 3 : petal width in cm  \n",
    "\n",
    "Class (y) :\n",
    "- 0 : class 0=Iris-Setosa, 1=Iris-Versicolour, 2=Iris-Virginica\n",
    "\n",
    "### 2.1 - Get dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x0,y0 = load_iris(return_X_y=True)\n",
    "\n",
    "x = x0[:, (2,3)]     # We only keep fields 2 and 3\n",
    "y = y0.copy()\n",
    "\n",
    "y[ y0==0 ] = 1       # 1 = Iris setosa\n",
    "y[ y0>=1 ] = 0       # 0 = not iris setosa\n",
    "\n",
    "df=pd.DataFrame.from_dict({'Length (x1)':x[:,0], 'Width (x2)':x[:,1], 'Setosa {0,1} (y)':y})\n",
    "display(df)\n",
    "\n",
    "print(f'x shape : {x.shape}')\n",
    "print(f'y shape : {y.shape}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 - Train and test sets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x,y = fidle.utils.shuffle_np_dataset(x, y)\n",
    "    \n",
    "n=int(len(x)*0.8)\n",
    "x_train = x[:n]\n",
    "y_train = y[:n]\n",
    "x_test  = x[n:]\n",
    "y_test  = y[n:]\n",
    "\n",
    "print(f'x_train shape : {x_train.shape}')\n",
    "print(f'y_train shape : {y_train.shape}')\n",
    "print(f'x_test shape  : {x_test.shape}')\n",
    "print(f'y_test shape  : {y_test.shape}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3 - Get a perceptron, and train it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pct = Perceptron(max_iter=100, random_state=82, tol=0.01, verbose=1)\n",
    "pct.fit(x_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4 - Prédictions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pred = pct.predict(x_test) \n",
    "\n",
    "df=pd.DataFrame.from_dict({'Length (x1)':x_test[:,0], 'Width (x2)':x_test[:,1], 'y_test':y_test, 'y_pred':y_pred})\n",
    "display(df[:15])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 5 - Visualisation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_perceptron(x_train,y_train,x_test,y_test):\n",
    "    a = -pct.coef_[0][0] / pct.coef_[0][1]\n",
    "    b = -pct.intercept_ / pct.coef_[0][1]\n",
    "    box=[x.min(axis=0)[0],x.max(axis=0)[0],x.min(axis=0)[1],x.max(axis=0)[1]]\n",
    "    mx=(box[1]-box[0])/20\n",
    "    my=(box[3]-box[2])/20\n",
    "    box=[box[0]-mx,box[1]+mx,box[2]-my,box[3]+my]\n",
    "\n",
    "    fig, axs = plt.subplots(1, 1)\n",
    "    fig.set_size_inches(10,6)\n",
    " \n",
    "    axs.plot(x_train[y_train==1, 0], x_train[y_train==1, 1], \"o\", color='tomato', label=\"Iris-Setosa\")\n",
    "    axs.plot(x_train[y_train==0, 0], x_train[y_train==0, 1], \"o\", color='steelblue',label=\"Autres\")\n",
    "    \n",
    "    axs.plot(x_test[y_pred==1, 0],   x_test[y_pred==1, 1],   \"o\", color='lightsalmon', label=\"Iris-Setosa (pred)\")\n",
    "    axs.plot(x_test[y_pred==0, 0],   x_test[y_pred==0, 1],   \"o\", color='lightblue',   label=\"Autres (pred)\")\n",
    "    \n",
    "    axs.plot([box[0], box[1]], [a*box[0]+b, a*box[1]+b], \"k--\", linewidth=2)\n",
    "    axs.set_xlabel(\"Petal length (cm)\", labelpad=15) #, fontsize=14)\n",
    "    axs.set_ylabel(\"Petal width (cm)\",  labelpad=15) #, fontsize=14)\n",
    "    axs.legend(loc=\"lower right\", fontsize=14)\n",
    "    axs.set_xlim(box[0],box[1])\n",
    "    axs.set_ylim(box[2],box[3])\n",
    "    fidle.scrawler.save_fig('01-perceptron-iris')\n",
    "    plt.show()\n",
    "    \n",
    "plot_perceptron(x_train,y_train, x_test,y_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fidle.end()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "<img width=\"80px\" src=\"../fidle/img/logo-paysage.svg\"></img>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.9.2 ('fidle-env')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.2"
  },
  "vscode": {
   "interpreter": {
    "hash": "b3929042cc22c1274d74e3e946c52b845b57cb6d84f2d591ffe0519b38e4896d"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}