{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "<img width=\"800px\" src=\"../fidle/img/header.svg\"></img>\n", "\n", "# <!-- TITLE --> [LVAE1] - First VAE, using Lightning API (MNIST dataset)\n", "<!-- DESC --> Construction and training of a VAE, using Lightning API, with a latent space of small dimension, using PyTorch Lightning\n", "\n", "<!-- AUTHOR : Achille Mbogol Touye (EFIlIA-MIAI/SIMaP) -->\n", "\n", "## Objectives :\n", " - Understanding and implementing a **variational autoencoder** neurals network (VAE)\n", " - Understanding **Ligthning API**, using two custom layers\n", "\n", "The calculation needs being important, it is preferable to use a very simple dataset such as MNIST to start with. \n", "...MNIST with a small scale if you haven't a GPU ;-)\n", "\n", "## What we're going to do :\n", "\n", " - Defining a VAE model\n", " - Build the model\n", " - Train it\n", " - Have a look on the train process\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1 - Init python stuff" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import torch\n", "import pandas as pd\n", "import numpy as np\n", "import torch.nn as nn\n", "import lightning.pytorch as pl\n", "\n", "from modules.datagen import MNIST\n", "from torch.utils.data import TensorDataset, DataLoader\n", "from modules.progressbar import CustomTrainProgressBar\n", "from modules.callbacks import ImagesCallback, BestModelCallback\n", "from modules.layers import SamplingLayer, VariationalLossLayer\n", "from lightning.pytorch.loggers.tensorboard import TensorBoardLogger\n", "\n", "import fidle\n", "\n", "# Init Fidle environment\n", "run_id, run_dir, datasets_dir = fidle.init('LVAE1')\n", "\n", "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2 - Parameters\n", "`scale` : With scale=1, we need 1'30s on a GPU V100 ...and >20' on a CPU !\\\n", "`latent_dim` : 2 dimensions is small, but usefull to draw !\\\n", "`fit_verbosity`: Verbosity of training progress bar: 0=silent, 1=progress bar, 2=One line \n", "\n", "`loss_weights` : Our **loss function** is the weighted sum of two loss:\n", " - `r_loss` which measures the loss during reconstruction. \n", " - `kl_loss` which measures the dispersion. \n", "\n", "The weights are defined by: `loss_weights=[k1,k2]` where : `vae_loss = k1*r_loss + k2*kl_loss` \n", "In practice, a value of \\[1,.001\\] gives good results here.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "latent_dim = 2\n", "loss_weights = [1,.001]\n", "\n", "scale = 0.2\n", "seed = 123\n", "\n", "batch_size = 64\n", "epochs = 10\n", "fit_verbosity = 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Override parameters (batch mode) - Just forget this cell" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fidle.override('latent_dim', 'loss_weights', 'scale', 'seed', 'batch_size', 'epochs', 'fit_verbosity')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3 - Prepare data\n", "`MNIST.get_data()` return : `x_train,y_train, x_test,y_test`, \\\n", "but we only need x_train for our training." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x_data, y_data, _,_ = MNIST.get_data(seed=seed, scale=scale, train_prop=1 )\n", "\n", "fidle.scrawler.images(x_data[:20], None, indices='all', columns=10, x_size=1,y_size=1,y_padding=0, save_as='01-original')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " ## 3.1 - For Training model use Dataloader\n", "The Dataset retrieves our dataset’s features and labels one sample at a time. While training a model, we typically want to pass samples in minibatches, reshuffle the data at every epoch to reduce model overfitting. DataLoader is an iterable that abstracts this complexity for us in an easy API" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_dataset = TensorDataset(x_data,y_data)\n", "\n", "# train bacth data\n", "train_loader= DataLoader(\n", " dataset=train_dataset, \n", " shuffle=False, \n", " batch_size=batch_size, \n", " num_workers=2 \n", ")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4 - Build model\n", "In this example, we will use the **pytorch ligthning API.** \n", "For this, we will use two custom layers :\n", " - `SamplingLayer`, which generates a vector z from the parameters z_mean and z_logvar - See : [SamplingLayer.py](./modules/layers/SamplingLayer.py)\n", " - `VariationalLossLayer`, which allows us to calculate the loss function, loss - See : [VariationalLossLayer.py](./modules/layers/VariationalLossLayer.py)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Encoder" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Encoder(nn.Module):\n", " def __init__(self, latent_dim):\n", " super().__init__()\n", " self.Convblock=nn.Sequential(\n", " nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, stride=1, padding=1),\n", " nn.BatchNorm2d(32),\n", " nn.LeakyReLU(0.2),\n", " \n", " nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=2, padding=1),\n", " nn.BatchNorm2d(64),\n", " nn.LeakyReLU(0.2),\n", " \n", " nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=2, padding=1),\n", " nn.BatchNorm2d(64),\n", " nn.LeakyReLU(0.2),\n", " \n", " nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1),\n", " nn.BatchNorm2d(64),\n", " nn.LeakyReLU(0.2),\n", " \n", " nn.Flatten(),\n", "\n", " nn.Linear(64*7*7, 16),\n", " nn.BatchNorm1d(16),\n", " nn.LeakyReLU(0.2),\n", " )\n", "\n", " self.z_mean = nn.Linear(16, latent_dim)\n", " self.z_logvar = nn.Linear(16, latent_dim)\n", " \n", "\n", "\n", " def forward(self, x):\n", " x = self.Convblock(x)\n", " z_mean = self.z_mean(x)\n", " z_logvar = self.z_logvar(x) \n", " z = SamplingLayer()([z_mean, z_logvar]) \n", " \n", " return z_mean, z_logvar, z " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Decoder" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Decoder(nn.Module):\n", " def __init__(self, latent_dim):\n", " super().__init__()\n", " self.linear=nn.Sequential(\n", " nn.Linear(latent_dim, 16),\n", " nn.BatchNorm1d(16),\n", " nn.ReLU(),\n", " \n", " nn.Linear(16, 64*7*7),\n", " nn.BatchNorm1d(64*7*7),\n", " nn.ReLU()\n", " )\n", " \n", " self.Deconvblock=nn.Sequential(\n", " nn.ConvTranspose2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1),\n", " nn.BatchNorm2d(64),\n", " nn.ReLU(),\n", " \n", " nn.ConvTranspose2d(in_channels=64, out_channels=64, kernel_size=4, stride=2, padding=1),\n", " nn.BatchNorm2d(64),\n", " nn.ReLU(),\n", " \n", " nn.ConvTranspose2d(in_channels=64, out_channels=32, kernel_size=4, stride=2, padding=1),\n", " nn.BatchNorm2d(32),\n", " nn.ReLU(),\n", " \n", " nn.ConvTranspose2d(in_channels=32, out_channels=1, kernel_size=3, stride=1, padding=1),\n", " nn.Sigmoid()\n", " )\n", " \n", "\n", "\n", " def forward(self, z):\n", " x = self.linear(z)\n", " x = x.reshape(-1,64,7,7)\n", " x_hat = self.Deconvblock(x)\n", " return x_hat" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### VAE\n", "\n", "We will calculate the loss with a specific layer: `VariationalLossLayer` - See : [VariationalLossLayer.py](./modules/layers/VariationalLossLayer.py)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class LitVAE(pl.LightningModule):\n", " \n", " def __init__(self, encoder, decoder):\n", " super().__init__()\n", " self.encoder = encoder\n", " self.decoder = decoder\n", " \n", " # forward pass\n", " def forward(self, x):\n", " z_mean, z_logvar, z = self.encoder(x)\n", " x_hat = self.decoder(z)\n", " return x_hat\n", "\n", " def training_step(self, batch, batch_idx):\n", " # training_step defines the train loop.\n", " x, _ = batch\n", " z_mean, z_logvar, z = self.encoder(x)\n", " x_hat = self.decoder(z)\n", "\n", " \n", " r_loss,kl_loss,loss = VariationalLossLayer(loss_weights=loss_weights)([x, z_mean,z_logvar,x_hat]) \n", "\n", " metrics = {\"r_loss\" : r_loss, \n", " \"kl_loss\" : kl_loss,\n", " \"vae_loss\" : loss\n", " }\n", " \n", " # logs metrics for each training_step\n", " self.log_dict(metrics,\n", " on_step = False,\n", " on_epoch = True, \n", " prog_bar = True, \n", " logger = True\n", " ) \n", " \n", " \n", " return loss\n", " \n", " def configure_optimizers(self):\n", " optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)\n", " return optimizer\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# print model\n", "vae=LitVAE(Encoder(latent_dim=2),Decoder(latent_dim=2))\n", "print(vae)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5 - Train\n", "### 5.1 - Using two nice custom callbacks :-)\n", "Two custom callbacks are used:\n", " - `ImagesCallback` : sauvegardera des images durant l'apprentissage - See [ImagesCallback.py](./modules/callbacks/ImagesCallback.py)\n", " - `BestModelCallback` : qui sauvegardera le meilleur model - See [BestModelCallback.py](./modules/callbacks/BestModelCallback.py)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# save best model\n", "save_dir = \"./run/models/\"\n", "BestModelCallback = BestModelCallback(dirpath= save_dir) \n", "CallbackImages = ImagesCallback(x=x_data, z_dim=latent_dim, nb_images=5, from_z=True, from_random=True, run_dir=run_dir)\n", "logger= TensorBoardLogger(save_dir='VAE1_logs',name=\"VAE_logs\") # loggers data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.2 - Let's train !\n", "With `scale=1`, need 1'15 on a GPU (V100 at IDRIS) ...or 20' on a CPU " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "chrono=fidle.Chrono()\n", "chrono.start()\n", "\n", "# train model\n", "trainer= pl.Trainer(accelerator='auto',\n", " max_epochs=epochs,\n", " logger=logger,\n", " num_sanity_val_steps=0,\n", " callbacks=[CustomTrainProgressBar(), BestModelCallback, CallbackImages]\n", " )\n", "\n", "trainer.fit(model=vae, train_dataloaders=train_loader)\n", "\n", "\n", "\n", "chrono.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 6 - Training review\n", "### 6.1 - History" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# launch Tensorboard \n", "%reload_ext tensorboard\n", "%tensorboard --logdir=./VAE1_logs/VAE_logs/ --bind_all" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6.2 - Reconstruction during training\n", "At the end of each epoch, our callback saved some reconstructed images. \n", "Where : \n", "Original image -> encoder -> z -> decoder -> Reconstructed image" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "images_z, images_r = CallbackImages.get_images( range(0,epochs,2) )\n", "\n", "fidle.utils.subtitle('Original images :')\n", "fidle.scrawler.images(x_data[:5], None, indices='all', columns=5, x_size=2,y_size=2, save_as=None)\n", "\n", "fidle.utils.subtitle('Encoded/decoded images')\n", "fidle.scrawler.images(images_z, None, indices='all', columns=5, x_size=2,y_size=2, save_as='02-reconstruct')\n", "\n", "fidle.utils.subtitle('Original images :')\n", "fidle.scrawler.images(x_data[:5], None, indices='all', columns=5, x_size=2,y_size=2, save_as=None)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6.3 - Generation (latent -> decoder)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "fidle.utils.subtitle('Generated images from latent space')\n", "fidle.scrawler.images(images_r, None, indices='all', columns=5, x_size=2,y_size=2, save_as='03-generated')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Annexe - Model Save and reload " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#---- Load the model from a checkpoint\n", "loaded_model = LitVAE.load_from_checkpoint(BestModelCallback.best_model_path,\n", " encoder=Encoder(latent_dim=2),\n", " decoder=Decoder(latent_dim=2))\n", "# put model in evaluation modecnrs\n", "loaded_model.eval()\n", "\n", "# ---- Retrieve a layer decoder\n", "decoder=loaded_model.decoder\n", "\n", "# example of z\n", "z = torch.Tensor([[-1,.1]]).to(device)\n", "img = decoder(z)\n", "\n", "fidle.scrawler.images(img.cpu().detach(), x_size=2,y_size=2, save_as='04-example')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fidle.end()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "<img width=\"80px\" src=\"../fidle/img/logo-paysage.svg\"></img>" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "vscode": { "interpreter": { "hash": "b3929042cc22c1274d74e3e946c52b845b57cb6d84f2d591ffe0519b38e4896d" } } }, "nbformat": 4, "nbformat_minor": 4 }