{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "<img width=\"800px\" src=\"../fidle/img/header.svg\"></img>\n", "\n", "# <!-- TITLE --> [K2AE1] - Prepare a noisy MNIST dataset\n", "<!-- DESC --> Episode 1: Preparation of a noisy MNIST dataset, using Keras 2 and Tensorflow (obsolete)\n", "\n", "<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->\n", "\n", "## Objectives :\n", " - Prepare a MNIST noisy dataset, usable with our denoiser autoencoder (duration : <50s)\n", "\n", "## What we're going to do :\n", "\n", " - Load original MNIST dataset\n", " - Adding noise, a lot !\n", " - Save it :-)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1 - Init and set parameters\n", "### 1.1 - Init python" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import sys\n", "\n", "from skimage import io\n", "from skimage.util import random_noise\n", "\n", "import modules.MNIST\n", "from modules.MNIST import MNIST\n", "\n", "import fidle\n", "\n", "# Init Fidle environment\n", "run_id, run_dir, datasets_dir = fidle.init('K2AE1')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.2 - Parameters\n", "`prepared_dataset` : Filename of the future prepared dataset (example : ./data/mnist-noisy.h5)\\\n", "`scale` : Dataset scale. 1 mean 100% of the dataset - set 0.1 for tests\\\n", "`progress_verbosity`: Verbosity of progress bar: 0=silent, 1=progress bar, 2=One line" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "prepared_dataset = './data/mnist-noisy.h5'\n", "scale = 1\n", "progress_verbosity = 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Override parameters (batch mode) - Just forget this cell" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fidle.override('prepared_dataset', 'scale', 'progress_verbosity')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2 - Get original dataset\n", "We load : \n", "`clean_data` : Original and clean images - This is what we will want to ontain at the **output** of the AE \n", "`class_data` : Image classes - Useless, because the training will be unsupervised \n", "We'll build : \n", "`noisy_data` : Noisy images - These are the images that we will give as **input** to our AE\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "clean_data, class_data = MNIST.get_origine(scale=scale)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3 - Add noise\n", "We add noise to the original images (clean_data) to obtain noisy images (noisy_data) \n", "Need 30-40 seconds" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def noise_it(data):\n", " new_data = np.copy(data)\n", " for i,image in enumerate(new_data):\n", " fidle.utils.update_progress('Add noise : ',i+1,len(data),verbosity=progress_verbosity)\n", " image=random_noise(image, mode='gaussian', mean=0, var=0.3)\n", " image=random_noise(image, mode='s&p', amount=0.2, salt_vs_pepper=0.5)\n", " image=random_noise(image, mode='poisson') \n", " image=random_noise(image, mode='speckle', mean=0, var=0.1)\n", " new_data[i]=image\n", " print('Done.')\n", " return new_data\n", "\n", "# ---- Add noise to input data : x_data\n", "#\n", "noisy_data = noise_it(clean_data)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4 - Have a look" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('Clean dataset (clean_data) : ',clean_data.shape)\n", "print('Noisy dataset (noisy_data) : ',noisy_data.shape)\n", "\n", "fidle.utils.subtitle(\"Noisy images we'll have in input (or x)\")\n", "fidle.scrawler.images(noisy_data[:5], None, indices='all', columns=5, x_size=3,y_size=3, interpolation=None, save_as='01-noisy')\n", "fidle.utils.subtitle('Clean images we want to obtain (or y)')\n", "fidle.scrawler.images(clean_data[:5], None, indices='all', columns=5, x_size=3,y_size=3, interpolation=None, save_as='02-original')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5 - Shuffle dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "p = np.random.permutation(len(clean_data))\n", "clean_data, noisy_data, class_data = clean_data[p], noisy_data[p], class_data[p]\n", "print('Shuffled.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 6 - Save our prepared dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "MNIST.save_prepared_dataset( clean_data, noisy_data, class_data, filename=prepared_dataset )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fidle.end()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "<img width=\"80px\" src=\"../fidle/img/logo-paysage.svg\"></img>" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.9.2 ('fidle-env')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" }, "vscode": { "interpreter": { "hash": "b3929042cc22c1274d74e3e946c52b845b57cb6d84f2d591ffe0519b38e4896d" } } }, "nbformat": 4, "nbformat_minor": 4 }