From 748bf82a9c6aa8223e747687e51847cfc329fdf1 Mon Sep 17 00:00:00 2001
From: Achille Mbogol Touye <achille.mbogol-touye@univ-grenoble-alpes.fr>
Date: Tue, 26 Sep 2023 17:33:41 +0200
Subject: [PATCH] 02-CNN-MNIST_Lightning

---
 MNIST_Lightning/02-CNN-MNIST_Lightning.ipynb | 567 +++++++++++++++++++
 1 file changed, 567 insertions(+)
 create mode 100644 MNIST_Lightning/02-CNN-MNIST_Lightning.ipynb

diff --git a/MNIST_Lightning/02-CNN-MNIST_Lightning.ipynb b/MNIST_Lightning/02-CNN-MNIST_Lightning.ipynb
new file mode 100644
index 0000000..11fff32
--- /dev/null
+++ b/MNIST_Lightning/02-CNN-MNIST_Lightning.ipynb
@@ -0,0 +1,567 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "86fe2213-fb44-4bd4-a371-a541cba6a744",
+   "metadata": {},
+   "source": [
+    "\n",
+    "<img width=\"800px\" src=\"../fidle/img/00-Fidle-header-01.svg\"></img>\n",
+    "\n",
+    "## <!-- TITLE --> [MNIST2] - Simple classification with CNN using lightning\n",
+    "<!-- DESC --> An example of classification using a convolutional neural network for the famous MNIST dataset\n",
+    "<!-- AUTHOR : MBOGOL Touye Achille (AI/ML Engineer MIAI/SIMaP) -->\n",
+    "\n",
+    "## Objectives :\n",
+    " - Recognizing handwritten numbers\n",
+    " - Understanding the principle of a classifier DNN network \n",
+    " - Implementation with pytorch lightning \n",
+    "\n",
+    "\n",
+    "The [MNIST dataset](http://yann.lecun.com/exdb/mnist/) (Modified National Institute of Standards and Technology) is a must for Deep Learning.  \n",
+    "It consists of 60,000 small images of handwritten numbers for learning and 10,000 for testing.\n",
+    "\n",
+    "\n",
+    "## What we're going to do :\n",
+    "\n",
+    " - Retrieve data\n",
+    " - Preparing the data\n",
+    " - Create a model\n",
+    " - Train the model\n",
+    " - Evaluate the result\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7f16101a-6612-4e02-93e9-c45ce1ac911d",
+   "metadata": {},
+   "source": [
+    "## Step 1 - Init python stuff"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "743c77d3-0983-491c-90be-ef2219861a47",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "\n",
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import lightning.pytorch as pl\n",
+    "import torch.nn.functional as F\n",
+    "import torchvision.transforms as T\n",
+    "\n",
+    "import sys,os\n",
+    "import multiprocessing\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "from lightning.pytorch.loggers.tensorboard import TensorBoardLogger\n",
+    "from torch.utils.data import Dataset, DataLoader\n",
+    "from torchvision import datasets\n",
+    "from torchmetrics.functional import accuracy\n",
+    "\n",
+    "# Init Fidle environment\n",
+    "import fidle\n",
+    "\n",
+    "run_id, run_dir, datasets_dir = fidle.init('MNIST_Ligthning')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df10dcda-aa63-476b-8665-9b1610fe51c6",
+   "metadata": {},
+   "source": [
+    "## Step 2 Retrieve data\n",
+    "\n",
+    "MNIST is one of the most famous historic dataset include in torchvision Datasets. `torchvision` provides many built-in datasets in the `torchvision.datasets`.  \n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6668e50c-f0c6-43cf-b733-9ac29d6a3900",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load data sets \n",
+    "train_dataset = datasets.MNIST(root=\"data\", train=True, download=True, transform=None)\n",
+    "\n",
+    "test_dataset= datasets.MNIST(root=\"data\", train=False, download=False, transform=None)\n",
+    "\n",
+    "# print info for train data\n",
+    "print(train_dataset)\n",
+    "\n",
+    "print()\n",
+    "\n",
+    "# print info for test data\n",
+    "print(test_dataset)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "44a489f5-3e53-4a2b-8069-f265b2814dc0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# See the shape of train data and test data\n",
+    "print(\"x_train : \",train_dataset.data.shape)\n",
+    "print(\"y_train : \",train_dataset.targets.shape)\n",
+    "\n",
+    "print()\n",
+    "\n",
+    "print(\"x_test  : \",test_dataset.data.shape)\n",
+    "print(\"y_test  : \",test_dataset.targets.shape)\n",
+    "\n",
+    "print()\n",
+    "\n",
+    "# print number of targets and  values targets\n",
+    "print(\"Number of Targets :\",len(np.unique(train_dataset.targets)))\n",
+    "print(\"Targets Values    :\",    np.unique(train_dataset.targets))\n",
+    "\n",
+    "print()\n",
+    "\n",
+    "print(\"Remark that we work with torch tensors and not numpy array, not tensorflow tensor\")\n",
+    "print(\" -> x_train.dtype = \",train_dataset.data.dtype)\n",
+    "print(\" -> y_train.dtype = \",train_dataset.targets.dtype)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b418adb7-33ea-450c-9793-3cdce5d5fa8c",
+   "metadata": {},
+   "source": [
+    "## Step 3 -  Preparing your data for training with DataLoaders\n",
+    "The Dataset retrieves our dataset’s features and labels one sample at a time. While training a model, we typically want to pass samples in `minibatches`, reshuffle the data at every epoch to reduce model overfitting, and use Python’s multiprocessing to speed up data retrieval. DataLoader is an iterable that abstracts this complexity for us in an easy API."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8af0bc4c-acb3-46d9-aae2-143b0327d970",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Before normalization:\n",
+    "x_train=train_dataset.data\n",
+    "print('Before normalization : Min={}, max={}'.format(x_train.min(),x_train.max()))\n",
+    "\n",
+    "# After normalization:\n",
+    "## T.Compose creates a pipeline where the provided transformations are run in sequence\n",
+    "transforms = T.Compose(\n",
+    "    [\n",
+    "        # This transforms takes a np.array or a PIL image of integers\n",
+    "        # in the range 0-255 and transforms it to a float tensor in the\n",
+    "        # range 0.0 - 1.0\n",
+    "        T.ToTensor(),\n",
+    "\n",
+    "        # This then renormalizes the tensor to be between -1.0 and 1.0,\n",
+    "        # which is a better range for modern activation functions like\n",
+    "        # Relu\n",
+    "        T.Normalize((0.5), (0.5)),\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "\n",
+    "train_dataset = datasets.MNIST(root=\"data\", train=True, download=True, transform=transforms)\n",
+    "test_dataset= datasets.MNIST(root=\"data\", train=False, download=True, transform=transforms)\n",
+    "\n",
+    "\n",
+    "# print image and label After normalization. \n",
+    "# iter() followed by next() is used to get some images and label\n",
+    "image,label=next(iter(train_dataset))\n",
+    "print('After normalization  : Min={}, max={}'.format(image.min(),image.max()))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35d50a57-8274-4660-8765-d0f2bf7214bd",
+   "metadata": {},
+   "source": [
+    "### Have a look"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a172ebc5-8858-4f30-8e2c-1e9c123ae0ee",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x_train=train_dataset.data\n",
+    "y_train=train_dataset.targets"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5a487760-b43a-4f7c-bfd8-1ce2c9652769",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fidle.scrawler.images(x_train, y_train, [27],  x_size=5,y_size=5, colorbar=True, save_as='01-one-digit')\n",
+    "fidle.scrawler.images(x_train, y_train, range(5,41), columns=12, save_as='02-many-digits')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ca0a63ae-e6d6-4940-b8ff-9b11cb2737bb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# get the number of CPUs in your system \n",
+    "n_workers = multiprocessing.cpu_count()\n",
+    "\n",
+    "# train bacth data\n",
+    "train_loader= DataLoader(\n",
+    "  dataset=train_dataset, \n",
+    "  shuffle=True, \n",
+    "  batch_size=512,\n",
+    "  num_workers=n_workers \n",
+    ")\n",
+    "\n",
+    "# test batch data\n",
+    "test_loader= DataLoader(\n",
+    "  dataset=test_dataset, \n",
+    "  shuffle=False, \n",
+    "  batch_size=512,\n",
+    "  num_workers=n_workers \n",
+    ")\n",
+    "\n",
+    "# print image and label  After normalization and batch_size.\n",
+    "image, label=next(iter(train_loader))\n",
+    "print('Shape of first training data batch after use pytorch dataloader :\\nbatch images = {} \\nbatch labels = {}'.format(image.shape,label.shape))      "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51bf21ee-76ca-42fa-b67f-066dbd239a72",
+   "metadata": {},
+   "source": [
+    "## Step 4 - Create Model\n",
+    "About informations about : \n",
+    " - [Optimizer](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers)\n",
+    " - [Activation](https://www.tensorflow.org/api_docs/python/tf/keras/activations)\n",
+    " - [Loss](https://www.tensorflow.org/api_docs/python/tf/keras/losses)\n",
+    " - [Metrics](https://www.tensorflow.org/api_docs/python/tf/keras/metrics)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "16701119-71eb-4f59-a50a-f153b07a74ae",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class MyNet(nn.Module):\n",
+    "    \n",
+    "    def __init__(self,num_class=10):\n",
+    "        super().__init__()\n",
+    "        self.num_class=num_class\n",
+    "        self.model = nn.Sequential(\n",
+    "            # first convolution  \n",
+    "            nn.Conv2d(in_channels=1, out_channels=8, kernel_size=3, stride=1, padding=0),\n",
+    "            nn.ReLU(),\n",
+    "            nn.MaxPool2d((2,2)),  \n",
+    "            nn.Dropout2d(0.2),            # Combat overfitting\n",
+    "            \n",
+    "            # second convolution  \n",
+    "            nn.Conv2d(in_channels=8, out_channels=16, kernel_size=3, stride=1, padding=0),\n",
+    "            nn.ReLU(),\n",
+    "            nn.MaxPool2d((2,2)),   \n",
+    "            nn.Dropout2d(0.2),            # Combat overfitting\n",
+    "            \n",
+    "            nn.Flatten(),                 # convert feature map into feature vectors\n",
+    "            \n",
+    "            # MLP network   \n",
+    "            nn.Linear(16*5*5,100),\n",
+    "            nn.ReLU(),\n",
+    "            nn.Dropout1d(0.2),            # Combat overfitting\n",
+    "        \n",
+    "            nn.Linear(100, num_class),    # logits outpout\n",
+    "        )\n",
+    "        \n",
+    "    def forward(self, x):\n",
+    "        x=self.model(x)                   # forward pass\n",
+    "        return x"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "37abf99b-f8ec-4048-a65d-f173ee18b234",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class LitModel(pl.LightningModule):\n",
+    "    \n",
+    "    def __init__(self, MyNet):\n",
+    "        super().__init__()\n",
+    "        self.MyNet = MyNet\n",
+    "                               \n",
+    "    def forward(self, x):                                               # forward pass\n",
+    "        return self.MyNet(x)\n",
+    "    \n",
+    "     # defines the train loop\n",
+    "    def training_step(self, batch, batch_idx):\n",
+    "        x, y = batch\n",
+    "        y_hat= self.MyNet(x)                                            # forward pass \n",
+    "        loss= F.cross_entropy(y_hat, y)                                 # loss function\n",
+    "        acc=accuracy(y_hat, y,task=\"multiclass\",num_classes=10)         # metrics    \n",
+    "        metrics = {\"train_loss\": loss, \"train_acc\": acc}\n",
+    "        \n",
+    "        # logs metrics for each training_step\n",
+    "        self.log_dict(metrics,\n",
+    "                      on_step=False ,\n",
+    "                      on_epoch=True, \n",
+    "                      prog_bar=True, \n",
+    "                      logger=True) \n",
+    "        return loss\n",
+    "\n",
+    "        \n",
+    "    # defines the valid loop.\n",
+    "    def validation_step(self, batch, batch_idx):\n",
+    "        x, y = batch\n",
+    "        y_hat= self.MyNet(x)                                             # forward pass\n",
+    "        loss = F.cross_entropy(y_hat, y)                                 # loss function MSE\n",
+    "        acc=accuracy(y_hat, y, task=\"multiclass\", num_classes=10)        # metrics\n",
+    "        metrics = {\"test_loss\": loss, \"test_acc\": acc}\n",
+    "        \n",
+    "        # logs metrics for each validation_step\n",
+    "        self.log_dict(metrics,\n",
+    "                      on_step=False,\n",
+    "                      on_epoch=True, \n",
+    "                      prog_bar=True, \n",
+    "                      logger=True\n",
+    "                     ) \n",
+    "        \n",
+    "        return metrics\n",
+    "        \n",
+    "    \n",
+    "    # complete prediction step\n",
+    "    def predict_step(self, batch, batch_idx):\n",
+    "        x, y = batch\n",
+    "        y_hat = self.MyNet(x)                                            # forward pass\n",
+    "        return y_hat\n",
+    "    \n",
+    "    \n",
+    "    # optimizer\n",
+    "    def configure_optimizers(self):\n",
+    "        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)\n",
+    "        return optimizer\n",
+    "\n",
+    "    \n",
+    "    \n",
+    "# print summary model\n",
+    "model=LitModel(MyNet())\n",
+    "print(model) "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb32e85d-bd92-4ca5-a3dc-ddb5ed50ba6b",
+   "metadata": {},
+   "source": [
+    "## Step 5 - Train Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce975c03-d05d-40c4-92ff-0cc90699c13e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# loggers data\n",
+    "logger = TensorBoardLogger(save_dir='MNIST2_logs',name=\"history_logs\")\n",
+    "\n",
+    "# train model\n",
+    "trainer=pl.Trainer(accelerator='auto',max_epochs=16,logger=logger)\n",
+    "trainer.fit(model=model, train_dataloaders=train_loader, val_dataloaders=test_loader)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a1191f05-4454-415c-a5ed-e63d9ae56651",
+   "metadata": {},
+   "source": [
+    "## Step 6 - Evaluate\n",
+    "### 6.1 - Final loss and accuracy\n",
+    "Note : With a DNN, we had a precision of the order of : 97.7%"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9f45316e-0d2d-4fc1-b9a8-5fb8aaf5586a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# evaluate your model\n",
+    "score=trainer.validate(model=model,dataloaders=test_loader, verbose=False)\n",
+    "\n",
+    "print('x_test / acc      : {:5.4f}'.format(score[0]['test_acc']))\n",
+    "print('x_test / loss     : {:5.4f}'.format(score[0]['test_loss']))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e352e48d-b473-4162-a1aa-72d6d4f7aa38",
+   "metadata": {},
+   "source": [
+    "## 6.2 - Plot history"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5b1c6d11-b897-4e2b-8615-c207c8344d07",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# launch Tensorboard \n",
+    "%reload_ext tensorboard\n",
+    "%tensorboard  --logdir MNIST2_logs/history_logs/ "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f00ded6b-a7db-4c5d-b1b2-72264db20bdb",
+   "metadata": {},
+   "source": [
+    "###  6.3 - Plot results"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e387a70d-9c23-4d16-8ef7-879aec7791e2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# logits outpout by batch size\n",
+    "y_logits=trainer.predict(model=model,dataloaders=test_loader)\n",
+    "\n",
+    "# Concat into single tensor\n",
+    "y_logits=torch.cat(y_logits)\n",
+    "\n",
+    "# output probabilities values\n",
+    "y_pred_values=F.softmax(y_logits,dim=1)\n",
+    "\n",
+    "# Returns the indices of the maximum output probabilities values \n",
+    "y_pred=torch.argmax(y_pred_values,dim=-1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fb2b2eeb-fcd8-453c-93ef-59a960a8bbd5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x_test=test_dataset.data\n",
+    "y_test=test_dataset.targets"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "71187fa9-2ad3-4b23-94b9-1846045bd070",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fidle.scrawler.images(x_test, y_test, range(0,200), columns=12, x_size=1, y_size=1, y_pred=y_pred, save_as='04-predictions')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2fc7b2b9-9115-4848-9aae-2798bf7aa79a",
+   "metadata": {},
+   "source": [
+    "### 6.4 - Plot some errors"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e55f17c4-fce7-423a-9adf-f2511c534ef5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "errors=[ i for i in range(len(x_test)) if y_pred[i]!=y_test[i] ]\n",
+    "errors=errors[:min(24,len(errors))]\n",
+    "fidle.scrawler.images(x_test, y_test, errors[:15], columns=6, x_size=2, y_size=2, y_pred=y_pred, save_as='05-some-errors')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fea1b396-70ca-4b00-851d-0538a4b347fb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fidle.scrawler.confusion_matrix(y_test,y_pred,range(10),normalize=True, save_as='06-confusion-matrix')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e982c032-cce8-4c71-8cdc-2af4b31b2914",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fidle.end()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "233838c2-c97f-4489-8c79-9247d7b7456b",
+   "metadata": {},
+   "source": [
+    "<div class=\"todo\">\n",
+    "    A few things you can do for fun:\n",
+    "    <ul>\n",
+    "        <li>Changing the network architecture (layers, number of neurons, etc.)</li>\n",
+    "        <li>Display a summary of the network</li>\n",
+    "        <li>Retrieve and display the softmax output of the network, to evaluate its \"doubts\".</li>\n",
+    "    </ul>\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51b87aa0-d4e9-48bb-8205-4b583f4b0b61",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "<img width=\"80px\" src=\"../fidle/img/00-Fidle-logo-01.svg\"></img>"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
-- 
GitLab