German Traffic Sign Recognition Benchmark (GTSRB)
=================================================
---
Introduction au Deep Learning  (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020

## Episode 4 : Data augmentation

Our main steps:
 - Increase and improve the learning dataset

## 1/ Import and init

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.callbacks import TensorBoard

import numpy as np
import h5py

from sklearn.metrics import confusion_matrix

import matplotlib.pyplot as plt
import seaborn as sn
import os, time, random

import fidle.pwk as ooo
from importlib import reload

ooo.init()

## 2/ Dataset loader
Dataset is one of the saved dataset: RGB25, RGB35, L25, L35, etc.  
First of all, we're going to use a smart dataset : **set-24x24-L**  
(with a GPU, it only takes 35'' compared to more than 5' with a CPU !)

In [None]:
%%time

def read_dataset(name):
    '''Reads h5 dataset from ./data

    Arguments:  dataset name, without .h5
    Returns:    x_train,y_train,x_test,y_test data'''
    # ---- Read dataset
    filename='./data/'+name+'.h5'
    with  h5py.File(filename) as f:
        x_train = f['x_train'][:]
        y_train = f['y_train'][:]
        x_test  = f['x_test'][:]
        y_test  = f['y_test'][:]

    # ---- done
    print('Dataset "{}" is loaded. ({:.1f} Mo)\n'.format(name,os.path.getsize(filename)/(1024*1024)))
    return x_train,y_train,x_test,y_test

## 3/ Models
We will now build a model and train it...

This is my model ;-) 

In [None]:
# A basic model
#
def get_model_v1(lx,ly,lz):
    
    model = keras.models.Sequential()
    
    model.add( keras.layers.Conv2D(96, (3,3), activation='relu', input_shape=(lx,ly,lz)))
    model.add( keras.layers.MaxPooling2D((2, 2)))
    model.add( keras.layers.Dropout(0.2))

    model.add( keras.layers.Conv2D(192, (3, 3), activation='relu'))
    model.add( keras.layers.MaxPooling2D((2, 2)))
    model.add( keras.layers.Dropout(0.2))

    model.add( keras.layers.Flatten()) 
    model.add( keras.layers.Dense(1500, activation='relu'))
    model.add( keras.layers.Dropout(0.5))

    model.add( keras.layers.Dense(43, activation='softmax'))
    return model

## 4/ Callbacks  
We prepare 2 kind callbacks :  TensorBoard and Model backup

In [None]:
%%bash
# To clean old logs and saved model, run this cell
#
/bin/rm -r ./run/logs   2>/dev/null
/bin/rm -r ./run/models 2>/dev/null
/bin/mkdir -p -m 755 ./run/logs
/bin/mkdir -p -m 755 ./run/models
echo -e "Reset directories : ./run/logs and ./run/models ."

In [None]:
# ---- Callback tensorboard
log_dir = "./run/logs/tb_" + ooo.tag_now()
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

# ---- Callback ModelCheckpoint - Save best model
save_dir = "./run/models/best-model.h5"
bestmodel_callback = tf.keras.callbacks.ModelCheckpoint(filepath=save_dir, verbose=0, monitor='accuracy', save_best_only=True)

# ---- Callback ModelCheckpoint - Save model each epochs
save_dir = "./run/models/model-{epoch:04d}.h5"
savemodel_callback = tf.keras.callbacks.ModelCheckpoint(filepath=save_dir, verbose=0, save_freq=2000*5)

## 5/ Load and prepare dataset
### 5.1/ Load

In [None]:
x_train,y_train,x_test,y_test = read_dataset('set-48x48-L-LHE')

### 5.2/ Data augmentation

In [None]:
datagen = keras.preprocessing.image.ImageDataGenerator(featurewise_center=False,
                             featurewise_std_normalization=False,
                             width_shift_range=0.1,
                             height_shift_range=0.1,
                             zoom_range=0.2,
                             shear_range=0.1,
                             rotation_range=10.)
datagen.fit(x_train)

## 5/ Train the model
**Get the shape of my data :**

In [None]:
(n,lx,ly,lz) = x_train.shape
print("Images of the dataset have this folowing shape : ",(lx,ly,lz))

**Get and compile a model, with the data shape :**

In [None]:
model = get_model_v3(lx,ly,lz)

# model.summary()

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

**Train it :**  
Note : La courbe d'apprentissage est visible en temps r√©el avec Tensorboard :    
`#tensorboard --logdir ./run/logs`  

In [None]:
%%time

batch_size = 64
epochs     = 30

# ---- Shuffle train data
#x_train,y_train=ooo.shuffle_np_dataset(x_train,y_train)

# ---- Train
#
history = model.fit(  datagen.flow(x_train, y_train, batch_size=batch_size),
                      epochs=epochs,
                      verbose=1,
                      validation_data=(x_test, y_test),
                      callbacks=[tensorboard_callback, bestmodel_callback, savemodel_callback] )

model.save('./run/models/last-model.h5')

**Evaluate it :**

In [None]:
max_val_accuracy = max(history.history["val_accuracy"])
print("Max validation accuracy is : {:.4f}".format(max_val_accuracy))

In [None]:
score = model.evaluate(x_test, y_test, verbose=0)

print('Test loss      : {:5.4f}'.format(score[0]))
print('Test accuracy  : {:5.4f}'.format(score[1]))

## 6/ History
The return of model.fit() returns us the learning history

In [None]:
ooo.plot_history(history)

## 8/ Evaluate best model

### 8.1/ Restore best model :

In [None]:
loaded_model = tf.keras.models.load_model('./run/models/best-model.h5')
# best_model.summary()
print("Loaded.")

### 8.2/ Evaluate it :

In [None]:
score = loaded_model.evaluate(x_test, y_test, verbose=0)

print('Test loss      : {:5.4f}'.format(score[0]))
print('Test accuracy  : {:5.4f}'.format(score[1]))

**Plot confusion matrix**

In [None]:
y_pred   = model.predict_classes(x_test)
conf_mat = confusion_matrix(y_test,y_pred, normalize="true", labels=range(43))

ooo.plot_confusion_matrix(conf_mat)

---
That's all folks !