Update BHPD, work on IMDB

Former-commit-id: 836daefc

Update BHPD, work on IMDB
Former-commit-id: 836daefc
8a5f3638 · Jean-Luc Parouty · 89acfef8 · 8a5f3638 · 8a5f3638 · 8a5f3638
Commit 8a5f3638 authored 5 years ago by Jean-Luc Parouty
--- a/BHPD/01-DNN-Regression.ipynb
+++ b/BHPD/01-DNN-Regression.ipynb
--- a/BHPD/02-DNN-Regression Premium.ipynb
+++ b/BHPD/02-DNN-Regression Premium.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Deep Neural Network (DNN) - BHPD dataset\n",
+    "========================================\n",
+    "---\n",
+    "Introduction au Deep Learning  (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020  \n",
+    "\n",
+    "## A very simple example of **regression** (Premium edition):\n",
+    "\n",
+    "Objective is to predicts **housing prices** from a set of house features. \n",
+    "\n",
+    "The **[Boston Housing Dataset](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html)** consists of price of houses in various places in Boston.  \n",
+    "Alongside with price, the dataset also provide information such as Crime, areas of non-retail business in the town,  \n",
+    "age of people who own the house and many other attributes...\n",
+    "\n",
+    "What we're going to do:\n",
+    "\n",
+    " - (Retrieve data)\n",
+    " - (Preparing the data)\n",
+    " - (Build a model)\n",
+    " - Train and save the model\n",
+    " - Restore saved model\n",
+    " - Evaluate the model\n",
+    " - Make some predictions\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1/ Init python stuff"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import tensorflow as tf\n",
+    "from tensorflow import keras\n",
+    "\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import pandas as pd\n",
+    "import os\n",
+    "\n",
+    "from IPython.display import display, Markdown\n",
+    "import fidle.pwk as ooo\n",
+    "from importlib import reload\n",
+    "\n",
+    "ooo.init()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2/ Retrieve data\n",
+    "\n",
+    "**From Keras :**\n",
+    "Boston housing is a famous historic dataset, so we can get it directly from [Keras datasets](https://www.tensorflow.org/api_docs/python/tf/keras/datasets)  "
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "metadata": {},
+   "source": [
+    "(x_train, y_train), (x_test, y_test) = keras.datasets.boston_housing.load_data(test_split=0.2, seed=113)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**From a csv file :**  \n",
+    "More fun !"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data = pd.read_csv('./data/BostonHousing.csv', header=0)\n",
+    "\n",
+    "display(data.head(5).style.format(\"{0:.2f}\"))\n",
+    "print('Données manquantes : ',data.isna().sum().sum(), '  Shape is : ', data.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3/ Preparing the data\n",
+    "### 3.1/ Split data\n",
+    "We will use 80% of the data for training and 20% for validation.  \n",
+    "x will be input data and y the expected output"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ---- Split => train, test\n",
+    "#\n",
+    "data_train = data.sample(frac=0.7, axis=0)\n",
+    "data_test  = data.drop(data_train.index)\n",
+    "\n",
+    "# ---- Split => x,y (medv is price)\n",
+    "#\n",
+    "x_train = data_train.drop('medv',  axis=1)\n",
+    "y_train = data_train['medv']\n",
+    "x_test  = data_test.drop('medv',   axis=1)\n",
+    "y_test  = data_test['medv']\n",
+    "\n",
+    "print('Original data shape was : ',data.shape)\n",
+    "print('x_train : ',x_train.shape, 'y_train : ',y_train.shape)\n",
+    "print('x_test  : ',x_test.shape,  'y_test  : ',y_test.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3.2/ Data normalization\n",
+    "**Note :** \n",
+    " - All input data must be normalized, train and test.  \n",
+    " - To do this we will subtract the mean and divide by the standard deviation.  \n",
+    " - But test data should not be used in any way, even for normalization.  \n",
+    " - The mean and the standard deviation will therefore only be calculated with the train data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display(x_train.describe().style.format(\"{0:.2f}\").set_caption(\"Before normalization :\"))\n",
+    "\n",
+    "mean = x_train.mean()\n",
+    "std  = x_train.std()\n",
+    "x_train = (x_train - mean) / std\n",
+    "x_test  = (x_test  - mean) / std\n",
+    "\n",
+    "display(x_train.describe().style.format(\"{0:.2f}\").set_caption(\"After normalization :\"))\n",
+    "\n",
+    "x_train, y_train = np.array(x_train), np.array(y_train)\n",
+    "x_test,  y_test  = np.array(x_test),  np.array(y_test)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4/ Build a model\n",
+    "About informations about : \n",
+    " - [Optimizer](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers)\n",
+    " - [Activation](https://www.tensorflow.org/api_docs/python/tf/keras/activations)\n",
+    " - [Loss](https://www.tensorflow.org/api_docs/python/tf/keras/losses)\n",
+    " - [Metrics](https://www.tensorflow.org/api_docs/python/tf/keras/metrics)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "  def get_model_v1(shape):\n",
+    "    \n",
+    "    model = keras.models.Sequential()\n",
+    "    model.add(keras.layers.Dense(64, activation='relu', input_shape=shape))\n",
+    "    model.add(keras.layers.Dense(64, activation='relu'))\n",
+    "    model.add(keras.layers.Dense(1))\n",
+    "    \n",
+    "    model.compile(optimizer = 'rmsprop',\n",
+    "                  loss      = 'mse',\n",
+    "                  metrics   = ['mae', 'mse'] )\n",
+    "    return model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5/ Train the model\n",
+    "### 5.1/ Get it"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model=get_model_v1( (13,) )\n",
+    "\n",
+    "model.summary()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 5.2/ Add callback"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "os.makedirs('./run/models',   mode=0o750, exist_ok=True)\n",
+    "save_dir = \"./run/models/best_model.h5\"\n",
+    "savemodel_callback = tf.keras.callbacks.ModelCheckpoint(filepath=save_dir, verbose=0, save_best_only=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 5.3/ Train it"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "history = model.fit(x_train,\n",
+    "                    y_train,\n",
+    "                    epochs          = 100,\n",
+    "                    batch_size      = 10,\n",
+    "                    verbose         = 0,\n",
+    "                    validation_data = (x_test, y_test),\n",
+    "                    callbacks       = [savemodel_callback])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6/ Evaluate\n",
+    "### 6.1/ Model evaluation\n",
+    "MAE =  Mean Absolute Error (between the labels and predictions)  \n",
+    "A mae equal to 3 represents an average error in prediction of $3k."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "score = model.evaluate(x_test, y_test, verbose=0)\n",
+    "\n",
+    "print('x_test / loss      : {:5.4f}'.format(score[0]))\n",
+    "print('x_test / mae       : {:5.4f}'.format(score[1]))\n",
+    "print('x_test / mse       : {:5.4f}'.format(score[2]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6.2/ Training history\n",
+    "What was the best result during our training ?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(\"min( val_mae ) : {:.4f}\".format( min(history.history[\"val_mae\"]) ) )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "reload(ooo)\n",
+    "ooo.plot_history(history, plot={'MSE' :['mse', 'val_mse'],\n",
+    "                                'MAE' :['mae', 'val_mae'],\n",
+    "                                'LOSS':['loss','val_loss']})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 7/ Restore a model :"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 7.1/ Reload model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "loaded_model = tf.keras.models.load_model('./run/models/best_model.h5')\n",
+    "loaded_model.summary()\n",
+    "print(\"Loaded.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 7.2/ Evaluate it :"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "score = loaded_model.evaluate(x_test, y_test, verbose=0)\n",
+    "\n",
+    "print('x_test / loss      : {:5.4f}'.format(score[0]))\n",
+    "print('x_test / mae       : {:5.4f}'.format(score[1]))\n",
+    "print('x_test / mse       : {:5.4f}'.format(score[2]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 7.3/ Make a prediction"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "predictions = loaded_model.predict( x_train[13].reshape(1,13) )\n",
+    "print(\"Prédiction : {:.2f} K$   Reality : {:.2f} K$\".format(predictions[0][0], y_train[13]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "-----\n",
+    "That's all folks !"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
+%% Cell type:markdown id: tags:
+
+Deep Neural Network (DNN) - BHPD dataset
+========================================
+---
+Introduction au Deep Learning  (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
+
+## A very simple example of **regression** (Premium edition):
+
+Objective is to predicts **housing prices** from a set of house features.
+
+The **[Boston Housing Dataset](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html)** consists of price of houses in various places in Boston.
+Alongside with price, the dataset also provide information such as Crime, areas of non-retail business in the town,
+age of people who own the house and many other attributes...
+
+What we're going to do:
+
+ - (Retrieve data)
+ - (Preparing the data)
+ - (Build a model)
+ - Train and save the model
+ - Restore saved model
+ - Evaluate the model
+ - Make some predictions
+
+%% Cell type:markdown id: tags:
+
+## 1/ Init python stuff
+
+%% Cell type:code id: tags:
+
+``` python
+import tensorflow as tf
+from tensorflow import keras
+
+import numpy as np
+import matplotlib.pyplot as plt
+import pandas as pd
+import os
+
+from IPython.display import display, Markdown
+import fidle.pwk as ooo
+from importlib import reload
+
+ooo.init()
+```
+
+%% Cell type:markdown id: tags:
+
+## 2/ Retrieve data
+
+**From Keras :**
+Boston housing is a famous historic dataset, so we can get it directly from [Keras datasets](https://www.tensorflow.org/api_docs/python/tf/keras/datasets)
+
+%% Cell type:raw id: tags:
+
+(x_train, y_train), (x_test, y_test) = keras.datasets.boston_housing.load_data(test_split=0.2, seed=113)
+
+%% Cell type:markdown id: tags:
+
+**From a csv file :**
+More fun !
+
+%% Cell type:code id: tags:
+
+``` python
+data = pd.read_csv('./data/BostonHousing.csv', header=0)
+
+display(data.head(5).style.format("{0:.2f}"))
+print('Données manquantes : ',data.isna().sum().sum(), '  Shape is : ', data.shape)
+```
+
+%% Cell type:markdown id: tags:
+
+## 3/ Preparing the data
+### 3.1/ Split data
+We will use 80% of the data for training and 20% for validation.
+x will be input data and y the expected output
+
+%% Cell type:code id: tags:
+
+``` python
+# ---- Split => train, test
+#
+data_train = data.sample(frac=0.7, axis=0)
+data_test  = data.drop(data_train.index)
+
+# ---- Split => x,y (medv is price)
+#
+x_train = data_train.drop('medv',  axis=1)
+y_train = data_train['medv']
+x_test  = data_test.drop('medv',   axis=1)
+y_test  = data_test['medv']
+
+print('Original data shape was : ',data.shape)
+print('x_train : ',x_train.shape, 'y_train : ',y_train.shape)
+print('x_test  : ',x_test.shape,  'y_test  : ',y_test.shape)
+```
+
+%% Cell type:markdown id: tags:
+
+### 3.2/ Data normalization
+**Note :**
+ - All input data must be normalized, train and test.
+ - To do this we will subtract the mean and divide by the standard deviation.
+ - But test data should not be used in any way, even for normalization.
+ - The mean and the standard deviation will therefore only be calculated with the train data.
+
+%% Cell type:code id: tags:
+
+``` python
+display(x_train.describe().style.format("{0:.2f}").set_caption("Before normalization :"))
+
+mean = x_train.mean()
+std  = x_train.std()
+x_train = (x_train - mean) / std
+x_test  = (x_test  - mean) / std
+
+display(x_train.describe().style.format("{0:.2f}").set_caption("After normalization :"))
+
+x_train, y_train = np.array(x_train), np.array(y_train)
+x_test,  y_test  = np.array(x_test),  np.array(y_test)
+```
+
+%% Cell type:markdown id: tags:
+
+## 4/ Build a model
+About informations about :
+ - [Optimizer](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers)
+ - [Activation](https://www.tensorflow.org/api_docs/python/tf/keras/activations)
+ - [Loss](https://www.tensorflow.org/api_docs/python/tf/keras/losses)
+ - [Metrics](https://www.tensorflow.org/api_docs/python/tf/keras/metrics)
+
+%% Cell type:code id: tags:
+
+``` python
+  def get_model_v1(shape):
+
+    model = keras.models.Sequential()
+    model.add(keras.layers.Dense(64, activation='relu', input_shape=shape))
+    model.add(keras.layers.Dense(64, activation='relu'))
+    model.add(keras.layers.Dense(1))
+
+    model.compile(optimizer = 'rmsprop',
+                  loss      = 'mse',
+                  metrics   = ['mae', 'mse'] )
+    return model
+```
+
+%% Cell type:markdown id: tags:
+
+## 5/ Train the model
+### 5.1/ Get it
+
+%% Cell type:code id: tags:
+
+``` python
+model=get_model_v1( (13,) )
+
+model.summary()
+```
+
+%% Cell type:markdown id: tags:
+
+### 5.2/ Add callback
+
+%% Cell type:code id: tags:
+
+``` python
+os.makedirs('./run/models',   mode=0o750, exist_ok=True)
+save_dir = "./run/models/best_model.h5"
+savemodel_callback = tf.keras.callbacks.ModelCheckpoint(filepath=save_dir, verbose=0, save_best_only=True)
+```
+
+%% Cell type:markdown id: tags:
+
+### 5.3/ Train it
+
+%% Cell type:code id: tags:
+
+``` python
+history = model.fit(x_train,
+                    y_train,
+                    epochs          = 100,
+                    batch_size      = 10,
+                    verbose         = 0,
+                    validation_data = (x_test, y_test),
+                    callbacks       = [savemodel_callback])
+```
+
+%% Cell type:markdown id: tags:
+
+## 6/ Evaluate
+### 6.1/ Model evaluation
+MAE =  Mean Absolute Error (between the labels and predictions)
+A mae equal to 3 represents an average error in prediction of $3k.
+
+%% Cell type:code id: tags:
+
+``` python
+score = model.evaluate(x_test, y_test, verbose=0)
+
+print('x_test / loss      : {:5.4f}'.format(score[0]))
+print('x_test / mae       : {:5.4f}'.format(score[1]))
+print('x_test / mse       : {:5.4f}'.format(score[2]))
+```
+
+%% Cell type:markdown id: tags:
+
+### 6.2/ Training history
+What was the best result during our training ?
+
+%% Cell type:code id: tags:
+
+``` python
+print("min( val_mae ) : {:.4f}".format( min(history.history["val_mae"]) ) )
+```
+
+%% Cell type:code id: tags:
+
+``` python
+reload(ooo)
+ooo.plot_history(history, plot={'MSE' :['mse', 'val_mse'],
+                                'MAE' :['mae', 'val_mae'],
+                                'LOSS':['loss','val_loss']})
+```
+
+%% Cell type:markdown id: tags:
+
+## 7/ Restore a model :
+
+%% Cell type:markdown id: tags:
+
+### 7.1/ Reload model
+
+%% Cell type:code id: tags:
+
+``` python
+loaded_model = tf.keras.models.load_model('./run/models/best_model.h5')
+loaded_model.summary()
+print("Loaded.")
+```
+
+%% Cell type:markdown id: tags:
+
+### 7.2/ Evaluate it :
+
+%% Cell type:code id: tags:
+
+``` python
+score = loaded_model.evaluate(x_test, y_test, verbose=0)
+
+print('x_test / loss      : {:5.4f}'.format(score[0]))
+print('x_test / mae       : {:5.4f}'.format(score[1]))
+print('x_test / mse       : {:5.4f}'.format(score[2]))
+```
+
+%% Cell type:markdown id: tags:
+
+### 7.3/ Make a prediction
+
+%% Cell type:code id: tags:
+
+``` python
+predictions = loaded_model.predict( x_train[13].reshape(1,13) )
+print("Prédiction : {:.2f} K$   Reality : {:.2f} K$".format(predictions[0][0], y_train[13]))
+```
+
+%% Cell type:markdown id: tags:
+
+-----
+That's all folks !
--- a/BHPD/fidle/pwk.py
+++ b/BHPD/fidle/pwk.py
@@ -26,7 +26,7 @@ import matplotlib
 import matplotlib.pyplot as plt
 import seaborn as sn

-VERSION='0.1.8'
+VERSION='0.2'


 # -------------------------------------------------------------
@@ -160,33 +160,6 @@ def plot_image(x,cm='binary', figsize=(4,4)):
 # show_history
 # -------------------------------------------------------------
 #
-def plot_history_obsolete(history, figsize=(8,6)):
-    """
-    Show history
-    args:
-        history: history
-        save_as: filename to save or None
-    """
-    # Accuracy 
-    plt.figure(figsize=figsize)
-    plt.plot(history.history['accuracy'])
-    plt.plot(history.history['val_accuracy'])
-    plt.title('Model accuracy')
-    plt.ylabel('Accuracy')
-    plt.xlabel('Epoch')
-    plt.legend(['Train', 'Test'], loc='upper left')
-    plt.show()
-
-    # Loss values
-    plt.figure(figsize=figsize)
-    plt.plot(history.history['loss'])
-    plt.plot(history.history['val_loss'])
-    plt.title('Model loss')
-    plt.ylabel('Loss')
-    plt.xlabel('Epoch')
-    plt.legend(['Train', 'Test'], loc='upper left')
-    plt.show()    
-
 def plot_history(history, figsize=(8,6), 
                  plot={"Accuracy":['accuracy','val_accuracy'], 'Loss':['loss', 'val_loss']}):
    """
@@ -194,7 +167,7 @@ def plot_history(history, figsize=(8,6),
    args:
        history: history
        figsize: fig size
-        plot: list of data to plot
+        plot: list of data to plot : {<title>:[<metrics>,...], ...}
    """
    for title,curves in plot.items():
        plt.figure(figsize=figsize)

--- a/GTSRB/03-Tracking-and-visualizing.ipynb
+++ b/GTSRB/03-Tracking-and-visualizing.ipynb
@@ -372,7 +372,7 @@
   "outputs": [],
   "source": [
    "loaded_model = tf.keras.models.load_model('./run/models/best-model.h5')\n",
-    "# best_model.summary()\n",
+    "# loaded_model.summary()\n",
    "print(\"Loaded.\")"
   ]
  },

 %% Cell type:markdown id: tags:

 German Traffic Sign Recognition Benchmark (GTSRB)
 =================================================
 ---
 Introduction au Deep Learning  (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020

 ## Episode 3 : Tracking, visualizing and save models

 Our main steps:
 - Monitoring and understanding our model training
 - Add recovery points
 - Analyze the results
 - Restore and run recovery pont

 ## 1/ Import and init

 %% Cell type:code id: tags:

 ``` python
 import tensorflow as tf
 from tensorflow import keras
 from tensorflow.keras.callbacks import TensorBoard

 import numpy as np
 import h5py

 from sklearn.metrics import confusion_matrix

 import matplotlib.pyplot as plt
 import seaborn as sn
 import os, time, random

 import fidle.pwk as ooo
 from importlib import reload

 ooo.init()
 ```

 %% Cell type:markdown id: tags:

 ## 2/ Load dataset
 Dataset is one of the saved dataset: RGB25, RGB35, L25, L35, etc.
 First of all, we're going to use a smart dataset : **set-24x24-L**
 (with a GPU, it only takes 35'' compared to more than 5' with a CPU !)

 %% Cell type:code id: tags:

 ``` python
 %%time

 def read_dataset(name):
    '''Reads h5 dataset from ./data

    Arguments:  dataset name, without .h5
    Returns:    x_train,y_train,x_test,y_test data'''
    # ---- Read dataset
    filename='./data/'+name+'.h5'
    with  h5py.File(filename) as f:
        x_train = f['x_train'][:]
        y_train = f['y_train'][:]
        x_test  = f['x_test'][:]
        y_test  = f['y_test'][:]
        x_meta  = f['x_meta'][:]
        y_meta  = f['y_meta'][:]

    # ---- done
    print('Dataset "{}" is loaded. ({:.1f} Mo)\n'.format(name,os.path.getsize(filename)/(1024*1024)))
    return x_train,y_train,x_test,y_test,x_meta,y_meta

 x_train,y_train,x_test,y_test,x_meta,y_meta = read_dataset('set-24x24-L')
 ```

 %% Cell type:markdown id: tags:

 ## 3/ Have a look to the dataset
 Note: Data must be reshape for matplotlib

 %% Cell type:code id: tags:

 ``` python
 print("x_train : ", x_train.shape)
 print("y_train : ", y_train.shape)
 print("x_test  : ", x_test.shape)
 print("y_test  : ", y_test.shape)

 ooo.plot_images(x_train, y_train, range(12), columns=6,  x_size=2, y_size=2)
 ooo.plot_images(x_train, y_train, range(36), columns=12, x_size=1, y_size=1)
 ```

 %% Cell type:markdown id: tags:

 ## 4/ Create model
 We will now build a model and train it...

 Some models...

 %% Cell type:code id: tags:

 ``` python
 # A basic model
 #
 def get_model_v1(lx,ly,lz):

    model = keras.models.Sequential()

    model.add( keras.layers.Conv2D(96, (3,3), activation='relu', input_shape=(lx,ly,lz)))
    model.add( keras.layers.MaxPooling2D((2, 2)))
    model.add( keras.layers.Dropout(0.2))

    model.add( keras.layers.Conv2D(192, (3, 3), activation='relu'))
    model.add( keras.layers.MaxPooling2D((2, 2)))
    model.add( keras.layers.Dropout(0.2))

    model.add( keras.layers.Flatten())
    model.add( keras.layers.Dense(1500, activation='relu'))
    model.add( keras.layers.Dropout(0.5))

    model.add( keras.layers.Dense(43, activation='softmax'))
    return model
 ```

 %% Cell type:markdown id: tags:

 ## 5/ Prepare callbacks
 We will add 2 callbacks :
 - **TensorBoard**
 Training logs, which can be visualised with Tensorboard.
 `#tensorboard --logdir ./run/logs`
 IMPORTANT : Relancer tensorboard à chaque run
 - **Model backup**
 It is possible to save the model each xx epoch or at each improvement.
 The model can be saved completely or partially (weight).
 For full format, we can use HDF5 format.

 %% Cell type:code id: tags:

 ``` python
 %%bash
 # To clean old logs and saved model, run this cell
 #
 /bin/rm -r ./run/logs   2>/dev/null
 /bin/rm -r ./run/models 2>/dev/null
 /bin/mkdir -p -m 755 ./run/logs
 /bin/mkdir -p -m 755 ./run/models
 echo -e "Reset directories : ./run/logs and ./run/models ."
 ```

 %% Cell type:code id: tags:

 ``` python
 ooo.mkdir('./run/models')
 ooo.mkdir('./run/logs')

 # ---- Callback tensorboard
 log_dir = "./run/logs/tb_" + ooo.tag_now()
 tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

 # ---- Callback ModelCheckpoint - Save best model
 save_dir = "./run/models/best-model.h5"
 bestmodel_callback = tf.keras.callbacks.ModelCheckpoint(filepath=save_dir, verbose=0, monitor='accuracy', save_best_only=True)

 # ---- Callback ModelCheckpoint - Save model each epochs
 save_dir = "./run/models/model-{epoch:04d}.h5"
 savemodel_callback = tf.keras.callbacks.ModelCheckpoint(filepath=save_dir, verbose=0, save_freq=2000*5)
 ```

 %% Cell type:markdown id: tags:

 ## 5/ Train the model
 **Get the shape of my data :**

 %% Cell type:code id: tags:

 ``` python
 (n,lx,ly,lz) = x_train.shape
 print("Images of the dataset have this folowing shape : ",(lx,ly,lz))
 ```

 %% Cell type:markdown id: tags:

 **Get and compile a model, with the data shape :**

 %% Cell type:code id: tags:

 ``` python
 model = get_model_v1(lx,ly,lz)

 # model.summary()

 model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
 ```

 %% Cell type:markdown id: tags:

 **Train it :**
 Note: The training curve is visible in real time with Tensorboard :
 `#tensorboard --logdir ./run/logs`

 %% Cell type:code id: tags:

 ``` python
 %%time

 batch_size = 64
 epochs     = 30

 # ---- Shuffle train data
 x_train,y_train=ooo.shuffle_np_dataset(x_train,y_train)

 # ---- Train
 # Note: To be faster in our example, we can take only 2000 values
 #
 history = model.fit(  x_train, y_train,
                      batch_size=batch_size,
                      epochs=epochs,
                      verbose=1,
                      validation_data=(x_test, y_test),
                      callbacks=[tensorboard_callback, bestmodel_callback, savemodel_callback] )

 model.save('./run/models/last-model.h5')
 ```

 %% Cell type:markdown id: tags:

 **Evaluate it :**

 %% Cell type:code id: tags:

 ``` python
 max_val_accuracy = max(history.history["val_accuracy"])
 print("Max validation accuracy is : {:.4f}".format(max_val_accuracy))
 ```

 %% Cell type:code id: tags:

 ``` python
 score = model.evaluate(x_test, y_test, verbose=0)

 print('Test loss      : {:5.4f}'.format(score[0]))
 print('Test accuracy  : {:5.4f}'.format(score[1]))
 ```

 %% Cell type:markdown id: tags:

 ## 6/ History
 The return of model.fit() returns us the learning history

 %% Cell type:code id: tags:

 ``` python
 ooo.plot_history(history)
 ```

 %% Cell type:markdown id: tags:

 ## 7/ Evaluation and confusion

 %% Cell type:code id: tags:

 ``` python
 y_pred   = model.predict_classes(x_test)
 conf_mat = confusion_matrix(y_test,y_pred, normalize="true", labels=range(43))

 ooo.plot_confusion_matrix(conf_mat)
 ```

 %% Cell type:markdown id: tags:

 ## 8/ Restore and evaluate
 ### 8.1/ List saved models :

 %% Cell type:code id: tags:

 ``` python
 !find ./run/models/
 ```

 %% Cell type:markdown id: tags:

 ### 8.2/ Restore a model :

 %% Cell type:code id: tags:

 ``` python
 loaded_model = tf.keras.models.load_model('./run/models/best-model.h5')
-# best_model.summary()
+# loaded_model.summary()
 print("Loaded.")
 ```

 %% Cell type:markdown id: tags:

 ### 8.3/ Evaluate it :

 %% Cell type:code id: tags:

 ``` python
 score = loaded_model.evaluate(x_test, y_test, verbose=0)

 print('Test loss      : {:5.4f}'.format(score[0]))
 print('Test accuracy  : {:5.4f}'.format(score[1]))
 ```

 %% Cell type:markdown id: tags:

 ### 8.4/ Make a prediction :

 %% Cell type:code id: tags:

 ``` python
 # ---- Get a random image
 #
 i   = random.randint(1,len(x_test))
 x,y = x_test[i], y_test[i]

 # ---- Do prediction
 #
 predictions = loaded_model.predict( np.array([x]) )

 # ---- A prediction is just the output layer
 #
 print("\nOutput layer from model is (x100) :\n")
 with np.printoptions(precision=2, suppress=True, linewidth=95):
    print(predictions*100)

 # ---- Graphic visualisation
 #
 print("\nGraphically :\n")
 plt.figure(figsize=(12,2))
 plt.bar(range(43), predictions[0], align='center', alpha=0.5)
 plt.ylabel('Probability')
 plt.ylim((0,1))
 plt.xlabel('Class')
 plt.title('Trafic Sign prediction')
 plt.show()

 # ---- Predict class
 #
 p = np.argmax(predictions)

 # ---- Show result
 #
 print("\nPrediction on the left, real stuff on the right :\n")
 ooo.plot_images([x,x_meta[y]], [p,y], range(2),  columns=3,  x_size=3, y_size=2)

 if p==y:
    print("YEEES ! that's right!")
 else:
    print("oups, that's wrong ;-(")
 ```

 %% Cell type:markdown id: tags:

 ---
 That's all folks !

 %% Cell type:code id: tags:

 ``` python
 ```

--- a/GTSRB/fidle/pwk.py
+++ b/GTSRB/fidle/pwk.py
@@ -26,7 +26,7 @@ import matplotlib
 import matplotlib.pyplot as plt
 import seaborn as sn

-VERSION='0.1.7'
+VERSION='0.2'


 # -------------------------------------------------------------
@@ -160,34 +160,27 @@ def plot_image(x,cm='binary', figsize=(4,4)):
 # show_history
 # -------------------------------------------------------------
 #
-def plot_history(history, figsize=(8,6)):
+def plot_history(history, figsize=(8,6), 
+                  plot={"Accuracy":['accuracy','val_accuracy'], 'Loss':['loss', 'val_loss']}):
    """
    Show history
    args:
        history: history
-        save_as: filename to save or None
+        figsize: fig size
+        plot: list of data to plot : {<title>:[<metrics>,...], ...}
    """
-    # Accuracy 
-    plt.figure(figsize=figsize)
-    plt.plot(history.history['accuracy'])
-    plt.plot(history.history['val_accuracy'])
-    plt.title('Model accuracy')
-    plt.ylabel('Accuracy')
-    plt.xlabel('Epoch')
-    plt.legend(['Train', 'Test'], loc='upper left')
-    plt.show()
-
-    # Loss values
-    plt.figure(figsize=figsize)
-    plt.plot(history.history['loss'])
-    plt.plot(history.history['val_loss'])
-    plt.title('Model loss')
-    plt.ylabel('Loss')
-    plt.xlabel('Epoch')
-    plt.legend(['Train', 'Test'], loc='upper left')
-    plt.show()    
-
-
+    for title,curves in plot.items():
+        plt.figure(figsize=figsize)
+        plt.title(title)
+        plt.ylabel(title)
+        plt.xlabel('Epoch')
+        for c in curves:
+            plt.plot(history.history[c])
+        plt.legend(curves, loc='upper left')
+        plt.show()
+
+    
+    
 # -------------------------------------------------------------
 # plot_confusion_matrix
 # -------------------------------------------------------------

--- a/IMDB/01-Embedding-Keras.ipynb
+++ b/IMDB/01-Embedding-Keras.ipynb
--- a/IMDB/fidle/__init__.py
+++ b/IMDB/fidle/__init__.py
+
+VERSION='0.1a'
\ No newline at end of file
--- a/IMDB/fidle/pwk.py
+++ b/IMDB/fidle/pwk.py
+
+# ==================================================================
+#  ____                 _   _           _  __        __         _
+# |  _ \ _ __ __ _  ___| |_(_) ___ __ _| | \ \      / /__  _ __| | __
+# | |_) | '__/ _` |/ __| __| |/ __/ _` | |  \ \ /\ / / _ \| '__| |/ /
+# |  __/| | | (_| | (__| |_| | (_| (_| | |   \ V  V / (_) | |  |   <
+# |_|   |_|  \__,_|\___|\__|_|\___\__,_|_|    \_/\_/ \___/|_|  |_|\_\
+#                                                        module pwk                                   
+# ==================================================================
+# A simple module to host some common functions for practical work
+# pjluc 2020
+
+import os
+import glob
+from datetime import datetime
+import itertools
+import datetime, time
+
+import math
+import numpy as np
+
+import tensorflow as tf
+from tensorflow import keras
+
+import matplotlib
+import matplotlib.pyplot as plt
+import seaborn as sn
+
+VERSION='0.2'
+
+
+# -------------------------------------------------------------
+# init_all
+# -------------------------------------------------------------
+#
+def init(mplstyle='fidle/talk.mplstyle'):
+    global VERSION
+    # ---- matplotlib
+    matplotlib.style.use(mplstyle)
+    # ---- Hello world
+#     now = datetime.datetime.now()
+    print('IDLE 2020 - Practical Work Module')
+    print('  Version            :', VERSION)
+    print('  Run time           : {}'.format(time.strftime("%A %-d %B %Y, %H:%M:%S")))
+    print('  Matplotlib style   :', mplstyle)
+    print('  TensorFlow version :',tf.__version__)
+    print('  Keras version      :',tf.keras.__version__)
+          
+# -------------------------------------------------------------
+# Folder cooking
+# -------------------------------------------------------------
+#
+def tag_now():
+    return datetime.datetime.now().strftime("%Y-%m-%d_%Hh%Mm%Ss")
+
+def mkdir(path):
+    os.makedirs(path, mode=0o750, exist_ok=True)
+      
+def get_directory_size(path):
+    """
+    Return the directory size, but only 1 level
+    args:
+        path : directory path
+    return:
+        size in Mo
+    """
+    size=0
+    for f in os.listdir(path):
+        if os.path.isfile(path+'/'+f):
+            size+=os.path.getsize(path+'/'+f)
+    return size/(1024*1024)
+
+# -------------------------------------------------------------
+# shuffle_dataset
+# -------------------------------------------------------------
+#
+def shuffle_np_dataset(x, y):
+    assert (len(x) == len(y)), "x and y must have same size"
+    p = np.random.permutation(len(x))
+    return x[p], y[p]
+
+
+def update_progress(what,i,imax):
+    bar_length = min(40,imax)
+    if (i%int(imax/bar_length))!=0 and i<imax:
+        return
+    progress  = float(i/imax)
+    block     = int(round(bar_length * progress))
+    endofline = '\r' if progress<1 else '\n'
+    text = "{:16s} [{}] {:>5.1f}% of {}".format( what, "#"*block+"-"*(bar_length-block), progress*100, imax)
+    print(text, end=endofline)
+
+
+# -------------------------------------------------------------
+# show_images
+# -------------------------------------------------------------
+#
+def plot_images(x,y, indices, columns=12, x_size=1, y_size=1, colorbar=False, y_pred=None, cm='binary'):
+    """
+    Show some images in a grid, with legends
+    args:
+        X: images - Shapes must be (-1 lx,ly,1) or (-1 lx,ly,3)
+        y: real classes
+        indices: indices of images to show
+        columns: number of columns (12)
+        x_size,y_size: figure size
+        colorbar: show colorbar (False)
+        y_pred: predicted classes (None)
+        cm: Matplotlib olor map
+    returns: 
+        nothing
+    """
+    rows    = math.ceil(len(indices)/columns)
+    fig=plt.figure(figsize=(columns*x_size, rows*(y_size+0.35)))
+    n=1
+    errors=0 
+    if np.any(y_pred)==None:
+        y_pred=y
+    for i in indices:
+        axs=fig.add_subplot(rows, columns, n)
+        n+=1
+        # ---- Shape is (lx,ly)
+        if len(x[i].shape)==2:
+            xx=x[i]
+        # ---- Shape is (lx,ly,n)
+        if len(x[i].shape)==3:
+            (lx,ly,lz)=x[i].shape
+            if lz==1: 
+                xx=x[i].reshape(lx,ly)
+            else:
+                xx=x[i]
+        img=axs.imshow(xx,   cmap = cm, interpolation='lanczos')
+        axs.spines['right'].set_visible(True)
+        axs.spines['left'].set_visible(True)
+        axs.spines['top'].set_visible(True)
+        axs.spines['bottom'].set_visible(True)
+        axs.set_yticks([])
+        axs.set_xticks([])
+        if y[i]!=y_pred[i]:
+            axs.set_xlabel('{} ({})'.format(y_pred[i],y[i]))
+            axs.xaxis.label.set_color('red')
+            errors+=1
+        else:
+            axs.set_xlabel(y[i])
+        if colorbar:
+            fig.colorbar(img,orientation="vertical", shrink=0.65)
+    plt.show()
+
+def plot_image(x,cm='binary', figsize=(4,4)):
+    (lx,ly,lz)=x.shape
+    plt.figure(figsize=figsize)
+    if lz==1:
+        plt.imshow(x.reshape(lx,ly),   cmap = cm, interpolation='lanczos')
+    else:
+        plt.imshow(x.reshape(lx,ly,lz),cmap = cm, interpolation='lanczos')
+    plt.show()
+
+
+# -------------------------------------------------------------
+# show_history
+# -------------------------------------------------------------
+#
+def plot_history(history, figsize=(8,6), 
+                  plot={"Accuracy":['accuracy','val_accuracy'], 'Loss':['loss', 'val_loss']}):
+    """
+    Show history
+    args:
+        history: history
+        figsize: fig size
+        plot: list of data to plot : {<title>:[<metrics>,...], ...}
+    """
+    for title,curves in plot.items():
+        plt.figure(figsize=figsize)
+        plt.title(title)
+        plt.ylabel(title)
+        plt.xlabel('Epoch')
+        for c in curves:
+            plt.plot(history.history[c])
+        plt.legend(curves, loc='upper left')
+        plt.show()
+
+    
+    
+# -------------------------------------------------------------
+# plot_confusion_matrix
+# -------------------------------------------------------------
+#
+def plot_confusion_matrix(cm,
+                          title='Confusion matrix',
+                          figsize=(12,8),
+                          cmap="gist_heat_r",
+                          vmin=0,
+                          vmax=1,
+                          xticks=5,yticks=5):
+    """
+    given a sklearn confusion matrix (cm), make a nice plot
+
+    Args:
+        cm:           confusion matrix from sklearn.metrics.confusion_matrix
+        title:        the text to display at the top of the matrix
+        figsize:      Figure size (12,8)
+        cmap:         color map (gist_heat_r)
+        vmi,vmax:     Min/max 0 and 1
+        
+    """
+ 
+    accuracy = np.trace(cm) / float(np.sum(cm))
+    misclass = 1 - accuracy
+
+    plt.figure(figsize=figsize)
+    sn.heatmap(cm, linewidths=1, linecolor="#ffffff",square=True, 
+               cmap=cmap, xticklabels=xticks, yticklabels=yticks,
+               vmin=vmin,vmax=vmax)
+    plt.ylabel('True label')
+    plt.xlabel('Predicted label\naccuracy={:0.4f}; misclass={:0.4f}'.format(accuracy, misclass))
+
+    plt.show()
--- a/IMDB/fidle/talk.mplstyle
+++ b/IMDB/fidle/talk.mplstyle
+
+# See : https://matplotlib.org/users/customizing.html
+
+axes.titlesize : 24
+axes.labelsize : 20
+axes.edgecolor      : dimgrey
+axes.labelcolor     : dimgrey
+axes.linewidth      : 2
+axes.grid           : False
+
+axes.prop_cycle    : cycler('color', ['steelblue', 'tomato', '2ca02c', 'd62728', '9467bd', '8c564b', 'e377c2', '7f7f7f', 'bcbd22', '17becf'])
+
+lines.linewidth     : 3
+lines.markersize    : 10
+
+xtick.color         : black
+xtick.labelsize     : 18
+ytick.color         : black
+ytick.labelsize     : 18
+
+axes.spines.left   : True
+axes.spines.bottom : True
+axes.spines.top    : False
+axes.spines.right  : False
+
+savefig.dpi         : 300      # figure dots per inch or 'figure'
+savefig.facecolor   : white    # figure facecolor when saving
+savefig.edgecolor   : white    # figure edgecolor when saving
+savefig.format      : svg
+savefig.bbox        : tight
+savefig.pad_inches  : 0.1
+savefig.transparent : True
+savefig.jpeg_quality: 95
--- a/MNIST/01-DNN-MNIST.ipynb
+++ b/MNIST/01-DNN-MNIST.ipynb
--- a/MNIST/fidle/pwk.py
+++ b/MNIST/fidle/pwk.py
@@ -26,7 +26,7 @@ import matplotlib
 import matplotlib.pyplot as plt
 import seaborn as sn

-VERSION='0.1.7'
+VERSION='0.2'


 # -------------------------------------------------------------
@@ -160,32 +160,26 @@ def plot_image(x,cm='binary', figsize=(4,4)):
 # show_history
 # -------------------------------------------------------------
 #
-def plot_history(history, figsize=(8,6)):
+def plot_history(history, figsize=(8,6), 
+                  plot={"Accuracy":['accuracy','val_accuracy'], 'Loss':['loss', 'val_loss']}):
    """
    Show history
    args:
        history: history
-        save_as: filename to save or None
+        figsize: fig size
+        plot: list of data to plot : {<title>:[<metrics>,...], ...}
    """
-    # Accuracy 
-    plt.figure(figsize=figsize)
-    plt.plot(history.history['accuracy'])
-    plt.plot(history.history['val_accuracy'])
-    plt.title('Model accuracy')
-    plt.ylabel('Accuracy')
-    plt.xlabel('Epoch')
-    plt.legend(['Train', 'Test'], loc='upper left')
-    plt.show()
+    for title,curves in plot.items():
+        plt.figure(figsize=figsize)
+        plt.title(title)
+        plt.ylabel(title)
+        plt.xlabel('Epoch')
+        for c in curves:
+            plt.plot(history.history[c])
+        plt.legend(curves, loc='upper left')
+        plt.show()

-    # Loss values
-    plt.figure(figsize=figsize)
-    plt.plot(history.history['loss'])
-    plt.plot(history.history['val_loss'])
-    plt.title('Model loss')
-    plt.ylabel('Loss')
-    plt.xlabel('Epoch')
-    plt.legend(['Train', 'Test'], loc='upper left')
-    plt.show()    
+    
    
 # -------------------------------------------------------------
 # plot_confusion_matrix