Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • daconcea/fidle
  • bossardl/fidle
  • Julie.Remenant/fidle
  • abijolao/fidle
  • monsimau/fidle
  • karkars/fidle
  • guilgautier/fidle
  • cailletr/fidle
  • talks/fidle
9 results
Show changes
Commits on Source (48)
Showing
with 5774 additions and 615 deletions
......@@ -2,5 +2,7 @@
*/.ipynb_checkpoints/*
__pycache__
*/__pycache__/*
/run/**
run/
*/data/*
!/GTSRB/data/dataset.tar.gz
!/BHPD/data/BostonHousing.csv
%% Cell type:markdown id: tags:
Deep Neural Network (DNN) - BHPD dataset
========================================
---
Introduction au Deep Learning (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
## A very simple example of **regression** :
Objective is to predicts **housing prices** from a set of house features.
The **[Boston Housing Dataset](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html)** consists of price of houses in various places in Boston.
Alongside with price, the dataset also provide information such as Crime, areas of non-retail business in the town,
age of people who own the house and many other attributes...
What we're going to do:
- Retrieve data
- Preparing the data
- Build a model
- Train the model
- Evaluate the result
%% Cell type:markdown id: tags:
## 1/ Init python stuff
%% Cell type:code id: tags:
``` python
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os,sys
from IPython.display import display, Markdown
from importlib import reload
sys.path.append('..')
import fidle.pwk as ooo
ooo.init()
```
%% Output
IDLE 2020 - Practical Work Module
Version : 0.2.4
Run time : Sunday 2 February 2020, 15:17:29
Matplotlib style : ../fidle/talk.mplstyle
TensorFlow version : 2.0.0
Keras version : 2.2.4-tf
%% Cell type:markdown id: tags:
## 2/ Retrieve data
**From Keras :**
Boston housing is a famous historic dataset, so we can get it directly from [Keras datasets](https://www.tensorflow.org/api_docs/python/tf/keras/datasets)
%% Cell type:raw id: tags:
(x_train, y_train), (x_test, y_test) = keras.datasets.boston_housing.load_data(test_split=0.2, seed=113)
%% Cell type:markdown id: tags:
**From a csv file :**
More fun !
%% Cell type:code id: tags:
``` python
data = pd.read_csv('./data/BostonHousing.csv', header=0)
display(data.head(5).style.format("{0:.2f}"))
print('Données manquantes : ',data.isna().sum().sum(), ' Shape is : ', data.shape)
```
%% Cell type:markdown id: tags:
## 3/ Preparing the data
### 3.1/ Split data
We will use 70% of the data for training and 30% for validation.
x will be input data and y the expected output
%% Cell type:code id: tags:
``` python
# ---- Split => train, test
#
data_train = data.sample(frac=0.7, axis=0)
data_test = data.drop(data_train.index)
# ---- Split => x,y (medv is price)
#
x_train = data_train.drop('medv', axis=1)
y_train = data_train['medv']
x_test = data_test.drop('medv', axis=1)
y_test = data_test['medv']
print('Original data shape was : ',data.shape)
print('x_train : ',x_train.shape, 'y_train : ',y_train.shape)
print('x_test : ',x_test.shape, 'y_test : ',y_test.shape)
```
%% Cell type:markdown id: tags:
### 3.2/ Data normalization
**Note :**
- All input data must be normalized, train and test.
- To do this we will subtract the mean and divide by the standard deviation.
- But test data should not be used in any way, even for normalization.
- The mean and the standard deviation will therefore only be calculated with the train data.
%% Cell type:code id: tags:
``` python
display(x_train.describe().style.format("{0:.2f}").set_caption("Before normalization :"))
mean = x_train.mean()
std = x_train.std()
x_train = (x_train - mean) / std
x_test = (x_test - mean) / std
display(x_train.describe().style.format("{0:.2f}").set_caption("After normalization :"))
x_train, y_train = np.array(x_train), np.array(y_train)
x_test, y_test = np.array(x_test), np.array(y_test)
```
%% Cell type:markdown id: tags:
## 4/ Build a model
About informations about :
- [Optimizer](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers)
- [Activation](https://www.tensorflow.org/api_docs/python/tf/keras/activations)
- [Loss](https://www.tensorflow.org/api_docs/python/tf/keras/losses)
- [Metrics](https://www.tensorflow.org/api_docs/python/tf/keras/metrics)
%% Cell type:code id: tags:
``` python
def get_model_v1(shape):
model = keras.models.Sequential()
model.add(keras.layers.Dense(64, activation='relu', input_shape=shape))
model.add(keras.layers.Dense(64, activation='relu'))
model.add(keras.layers.Dense(1))
model.compile(optimizer = 'rmsprop',
loss = 'mse',
metrics = ['mae', 'mse'] )
return model
```
%% Cell type:markdown id: tags:
## 5/ Train the model
%% Cell type:code id: tags:
``` python
model=get_model_v1( (13,) )
model.summary()
```
%% Cell type:markdown id: tags:
**Let's go :**
%% Cell type:code id: tags:
``` python
history = model.fit(x_train,
y_train,
epochs = 100,
batch_size = 10,
verbose = 1,
validation_data = (x_test, y_test))
```
%% Cell type:markdown id: tags:
## 6/ Evaluate
### 6.1/ Model evaluation
MAE = Mean Absolute Error (between the labels and predictions)
A mae equal to 3 represents an average error in prediction of $3k.
%% Cell type:code id: tags:
``` python
score = model.evaluate(x_test, y_test, verbose=0)
print('x_test / loss : {:5.4f}'.format(score[0]))
print('x_test / mae : {:5.4f}'.format(score[1]))
print('x_test / mse : {:5.4f}'.format(score[2]))
```
%% Cell type:markdown id: tags:
### 6.2/ Training history
What was the best result during our training ?
%% Cell type:code id: tags:
``` python
df=pd.DataFrame(data=history.history)
df.describe()
```
%% Cell type:code id: tags:
``` python
print("min( val_mae ) : {:.4f}".format( min(history.history["val_mae"]) ) )
```
%% Cell type:code id: tags:
``` python
ooo.plot_history(history, plot={'MSE' :['mse', 'val_mse'],
'MAE' :['mae', 'val_mae'],
'LOSS':['loss','val_loss']})
```
%% Cell type:markdown id: tags:
## Step 7 - Make a prediction
%% Cell type:code id: tags:
``` python
my_data = [ 1.26425925, -0.48522739, 1.0436489 , -0.23112788, 1.37120745,
-2.14308942, 1.13489104, -1.06802005, 1.71189006, 1.57042287,
0.77859951, 0.14769795, 2.7585581 ]
real_price = 10.4
my_data=np.array(my_data).reshape(1,13)
```
%% Cell type:code id: tags:
``` python
predictions = model.predict( my_data )
print("Prédiction : {:.2f} K$".format(predictions[0][0]))
print("Reality : {:.2f} K$".format(real_price))
```
%% Cell type:markdown id: tags:
---
That's all folks !
%% Cell type:markdown id: tags:
Deep Neural Network (DNN) - BHPD dataset
========================================
---
Introduction au Deep Learning (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
## A very simple example of **regression** (Premium edition):
Objective is to predicts **housing prices** from a set of house features.
The **[Boston Housing Dataset](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html)** consists of price of houses in various places in Boston.
Alongside with price, the dataset also provide information such as Crime, areas of non-retail business in the town,
age of people who own the house and many other attributes...
What we're going to do:
- (Retrieve data)
- (Preparing the data)
- (Build a model)
- Train and save the model
- Restore saved model
- Evaluate the model
- Make some predictions
%% Cell type:markdown id: tags:
## 1/ Init python stuff
%% Cell type:code id: tags:
``` python
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os,sys
from IPython.display import display, Markdown
from importlib import reload
sys.path.append('..')
import fidle.pwk as ooo
ooo.init()
```
%% Cell type:markdown id: tags:
## 2/ Retrieve data
**From Keras :**
Boston housing is a famous historic dataset, so we can get it directly from [Keras datasets](https://www.tensorflow.org/api_docs/python/tf/keras/datasets)
%% Cell type:raw id: tags:
(x_train, y_train), (x_test, y_test) = keras.datasets.boston_housing.load_data(test_split=0.2, seed=113)
%% Cell type:markdown id: tags:
**From a csv file :**
More fun !
%% Cell type:code id: tags:
``` python
data = pd.read_csv('./data/BostonHousing.csv', header=0)
display(data.head(5).style.format("{0:.2f}"))
print('Données manquantes : ',data.isna().sum().sum(), ' Shape is : ', data.shape)
```
%% Cell type:markdown id: tags:
## 3/ Preparing the data
### 3.1/ Split data
We will use 80% of the data for training and 20% for validation.
x will be input data and y the expected output
%% Cell type:code id: tags:
``` python
# ---- Split => train, test
#
data_train = data.sample(frac=0.7, axis=0)
data_test = data.drop(data_train.index)
# ---- Split => x,y (medv is price)
#
x_train = data_train.drop('medv', axis=1)
y_train = data_train['medv']
x_test = data_test.drop('medv', axis=1)
y_test = data_test['medv']
print('Original data shape was : ',data.shape)
print('x_train : ',x_train.shape, 'y_train : ',y_train.shape)
print('x_test : ',x_test.shape, 'y_test : ',y_test.shape)
```
%% Cell type:markdown id: tags:
### 3.2/ Data normalization
**Note :**
- All input data must be normalized, train and test.
- To do this we will subtract the mean and divide by the standard deviation.
- But test data should not be used in any way, even for normalization.
- The mean and the standard deviation will therefore only be calculated with the train data.
%% Cell type:code id: tags:
``` python
display(x_train.describe().style.format("{0:.2f}").set_caption("Before normalization :"))
mean = x_train.mean()
std = x_train.std()
x_train = (x_train - mean) / std
x_test = (x_test - mean) / std
display(x_train.describe().style.format("{0:.2f}").set_caption("After normalization :"))
x_train, y_train = np.array(x_train), np.array(y_train)
x_test, y_test = np.array(x_test), np.array(y_test)
```
%% Cell type:markdown id: tags:
## 4/ Build a model
More informations about :
- [Optimizer](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers)
- [Activation](https://www.tensorflow.org/api_docs/python/tf/keras/activations)
- [Loss](https://www.tensorflow.org/api_docs/python/tf/keras/losses)
- [Metrics](https://www.tensorflow.org/api_docs/python/tf/keras/metrics)
%% Cell type:code id: tags:
``` python
def get_model_v1(shape):
model = keras.models.Sequential()
model.add(keras.layers.Dense(64, activation='relu', input_shape=shape))
model.add(keras.layers.Dense(64, activation='relu'))
model.add(keras.layers.Dense(1))
model.compile(optimizer = 'rmsprop',
loss = 'mse',
metrics = ['mae', 'mse'] )
return model
```
%% Cell type:markdown id: tags:
## 5/ Train the model
### 5.1/ Get it
%% Cell type:code id: tags:
``` python
model=get_model_v1( (13,) )
model.summary()
```
%% Cell type:markdown id: tags:
### 5.2/ Add callback
%% Cell type:code id: tags:
``` python
os.makedirs('./run/models', mode=0o750, exist_ok=True)
save_dir = "./run/models/best_model.h5"
savemodel_callback = tf.keras.callbacks.ModelCheckpoint(filepath=save_dir, verbose=0, save_best_only=True)
```
%% Cell type:markdown id: tags:
### 5.3/ Train it
%% Cell type:code id: tags:
``` python
history = model.fit(x_train,
y_train,
epochs = 100,
batch_size = 10,
verbose = 0,
validation_data = (x_test, y_test),
callbacks = [savemodel_callback])
```
%% Cell type:markdown id: tags:
## 6/ Evaluate
### 6.1/ Model evaluation
MAE = Mean Absolute Error (between the labels and predictions)
A mae equal to 3 represents an average error in prediction of $3k.
%% Cell type:code id: tags:
``` python
score = model.evaluate(x_test, y_test, verbose=0)
print('x_test / loss : {:5.4f}'.format(score[0]))
print('x_test / mae : {:5.4f}'.format(score[1]))
print('x_test / mse : {:5.4f}'.format(score[2]))
```
%% Cell type:markdown id: tags:
### 6.2/ Training history
What was the best result during our training ?
%% Cell type:code id: tags:
``` python
print("min( val_mae ) : {:.4f}".format( min(history.history["val_mae"]) ) )
```
%% Cell type:code id: tags:
``` python
reload(ooo)
ooo.plot_history(history, plot={'MSE' :['mse', 'val_mse'],
'MAE' :['mae', 'val_mae'],
'LOSS':['loss','val_loss']})
```
%% Cell type:markdown id: tags:
## 7/ Restore a model :
%% Cell type:markdown id: tags:
### 7.1/ Reload model
%% Cell type:code id: tags:
``` python
loaded_model = tf.keras.models.load_model('./run/models/best_model.h5')
loaded_model.summary()
print("Loaded.")
```
%% Cell type:markdown id: tags:
### 7.2/ Evaluate it :
%% Cell type:code id: tags:
``` python
score = loaded_model.evaluate(x_test, y_test, verbose=0)
print('x_test / loss : {:5.4f}'.format(score[0]))
print('x_test / mae : {:5.4f}'.format(score[1]))
print('x_test / mse : {:5.4f}'.format(score[2]))
```
%% Cell type:markdown id: tags:
### 7.3/ Make a prediction
%% Cell type:code id: tags:
``` python
mon_test=[ 1.26425925, -0.48522739, 1.0436489 , -0.23112788, 1.37120745,
-2.14308942, 1.13489104, -1.06802005, 1.71189006, 1.57042287,
0.77859951, 0.14769795, 2.7585581 ]
mon_test=np.array(mon_test).reshape(1,13)
```
%% Cell type:code id: tags:
``` python
predictions = loaded_model.predict( mon_test )
print("Prédiction : {:.2f} K$ Reality : {:.2f} K$".format(predictions[0][0], y_train[13]))
```
%% Cell type:markdown id: tags:
-----
That's all folks !
This diff is collapsed.
%% Cell type:markdown id: tags:
German Traffic Sign Recognition Benchmark (GTSRB)
=================================================
---
Introduction au Deep Learning (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
## Episode 1 : Preparation of data
- Understanding the dataset
- Preparing and formatting enhanced data
- Save enhanced datasets in h5 file format
%% Cell type:markdown id: tags:
## 1/ Import and init
%% Cell type:code id: tags:
``` python
import os, time, sys
import csv
import math, random
import numpy as np
import matplotlib.pyplot as plt
import h5py
from skimage.morphology import disk
from skimage.filters import rank
from skimage import io, color, exposure, transform
from importlib import reload
sys.path.append('..')
import fidle.pwk as ooo
ooo.init()
```
%% Cell type:markdown id: tags:
## 2/ Read the dataset
Description is available there : http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset
- Each directory contains one CSV file with annotations ("GT-<ClassID>.csv") and the training images
- First line is fieldnames: Filename;Width;Height;Roi.X1;Roi.Y1;Roi.X2;Roi.Y2;ClassId
### 2.1/ Usefull functions
%% Cell type:code id: tags:
``` python
def read_dataset_dir(csv_filename):
'''Reads traffic sign data from German Traffic Sign Recognition Benchmark dataset.
Arguments: csv filename
Example /data/GTSRB/Train.csv
Returns: np array of images, np array of corresponding labels'''
# ---- csv filename and path
#
name=os.path.basename(csv_filename)
path=os.path.dirname(csv_filename)
# ---- Read csv file
#
f,x,y = [],[],[]
with open(csv_filename) as csv_file:
reader = csv.DictReader(csv_file, delimiter=',')
for row in reader:
f.append( path+'/'+row['Path'] )
y.append( int(row['ClassId']) )
csv_file.close()
nb_images = len(f)
# ---- Read images
#
for filename in f:
image=io.imread(filename)
x.append(image)
ooo.update_progress(name,len(x),nb_images)
# ---- Return
#
return np.array(x),np.array(y)
```
%% Cell type:markdown id: tags:
### 2.2/ Read the data
We will read the following datasets:
- **x_train, y_train** : Learning data
- **x_test, y_test** : Validation or test data
- x_meta, y_meta : Illustration data
The learning data will be randomly mixted and the illustration data sorted.
Will take about 2-3'
%% Cell type:code id: tags:
``` python
%%time
# ---- Read datasets
(x_train,y_train) = read_dataset_dir('./data/origine/Train.csv')
(x_test ,y_test) = read_dataset_dir('./data/origine/Test.csv')
(x_meta ,y_meta) = read_dataset_dir('./data/origine/Meta.csv')
# ---- Shuffle train set
combined = list(zip(x_train,y_train))
random.shuffle(combined)
x_train,y_train = zip(*combined)
# ---- Sort Meta
combined = list(zip(x_meta,y_meta))
combined.sort(key=lambda x: x[1])
x_meta,y_meta = zip(*combined)
```
%% Cell type:markdown id: tags:
## 3/ Few statistics about train dataset
We want to know if our images are homogeneous in terms of size, ratio, width or height.
### 3.1/ Do statistics
%% Cell type:code id: tags:
``` python
train_size = []
train_ratio = []
train_lx = []
train_ly = []
test_size = []
test_ratio = []
test_lx = []
test_ly = []
for image in x_train:
(lx,ly,lz) = image.shape
train_size.append(lx*ly/1024)
train_ratio.append(lx/ly)
train_lx.append(lx)
train_ly.append(ly)
for image in x_test:
(lx,ly,lz) = image.shape
test_size.append(lx*ly/1024)
test_ratio.append(lx/ly)
test_lx.append(lx)
test_ly.append(ly)
```
%% Cell type:markdown id: tags:
### 3.2/ Show statistics
%% Cell type:code id: tags:
``` python
# ------ Global stuff
print("x_train size : ",len(x_train))
print("y_train size : ",len(y_train))
print("x_test size : ",len(x_test))
print("y_test size : ",len(y_test))
# ------ Statistics / sizes
plt.figure(figsize=(16,6))
plt.hist([train_size,test_size], bins=100)
plt.gca().set(title='Sizes in Kpixels - Train=[{:5.2f}, {:5.2f}]'.format(min(train_size),max(train_size)),
ylabel='Population',
xlim=[0,30])
plt.legend(['Train','Test'])
plt.show()
# ------ Statistics / ratio lx/ly
plt.figure(figsize=(16,6))
plt.hist([train_ratio,test_ratio], bins=100)
plt.gca().set(title='Ratio lx/ly - Train=[{:5.2f}, {:5.2f}]'.format(min(train_ratio),max(train_ratio)),
ylabel='Population',
xlim=[0.8,1.2])
plt.legend(['Train','Test'])
plt.show()
# ------ Statistics / lx
plt.figure(figsize=(16,6))
plt.hist([train_lx,test_lx], bins=100)
plt.gca().set(title='Images lx - Train=[{:5.2f}, {:5.2f}]'.format(min(train_lx),max(train_lx)),
ylabel='Population',
xlim=[20,150])
plt.legend(['Train','Test'])
plt.show()
# ------ Statistics / ly
plt.figure(figsize=(16,6))
plt.hist([train_ly,test_ly], bins=100)
plt.gca().set(title='Images ly - Train=[{:5.2f}, {:5.2f}]'.format(min(train_ly),max(train_ly)),
ylabel='Population',
xlim=[20,150])
plt.legend(['Train','Test'])
plt.show()
# ------ Statistics / classId
plt.figure(figsize=(16,6))
plt.hist([y_train,y_test], bins=43)
plt.gca().set(title='ClassesId',
ylabel='Population',
xlim=[0,43])
plt.legend(['Train','Test'])
plt.show()
```
%% Cell type:markdown id: tags:
## 4/ List of classes
What are the 43 classes of our images...
%% Cell type:code id: tags:
``` python
ooo.plot_images(x_meta,y_meta, range(43), columns=8, x_size=2, y_size=2,
colorbar=False, y_pred=None, cm='binary')
```
%% Cell type:markdown id: tags:
## 5/ What does it really look like
%% Cell type:code id: tags:
``` python
# ---- Get and show few images
samples = [ random.randint(0,len(x_train)-1) for i in range(32)]
ooo.plot_images(x_train,y_train, samples, columns=8, x_size=2, y_size=2, colorbar=False, y_pred=None, cm='binary')
```
%% Cell type:markdown id: tags:
## 6/ dataset cooking...
Images must have the **same size** to match the size of the network.
It is possible to work on **rgb** or **monochrome** images and **equalize** the histograms.
The data must be **normalized**.
See : [Exposure with scikit-image](https://scikit-image.org/docs/dev/api/skimage.exposure.html)
See : [Local histogram equalization](https://scikit-image.org/docs/dev/api/skimage.filters.rank.html#skimage.filters.rank.equalize)
See : [Histogram equalization](https://scikit-image.org/docs/dev/api/skimage.exposure.html#skimage.exposure.equalize_hist)
### 6.1/ Enhancement cook
%% Cell type:code id: tags:
``` python
def images_enhancement(images, width=25, height=25, mode='RGB'):
'''
Resize and convert images - doesn't change originals.
input images must be RGBA or RGB.
args:
images : images list
width,height : new images size (25,25)
mode : RGB | RGB-HE | L | L-HE | L-LHE | L-CLAHE
return:
numpy array of enhanced images
'''
modes = { 'RGB':3, 'RGB-HE':3, 'L':1, 'L-HE':1, 'L-LHE':1, 'L-CLAHE':1}
lz=modes[mode]
out=[]
for img in images:
# ---- if RGBA, convert to RGB
if img.shape[2]==4:
img=color.rgba2rgb(img)
# ---- Resize
img = transform.resize(img, (width,height))
# ---- RGB / Histogram Equalization
if mode=='RGB-HE':
hsv = color.rgb2hsv(img.reshape(width,height,3))
hsv[:, :, 2] = exposure.equalize_hist(hsv[:, :, 2])
img = color.hsv2rgb(hsv)
# ---- Grayscale
if mode=='L':
img=color.rgb2gray(img)
# ---- Grayscale / Histogram Equalization
if mode=='L-HE':
img=color.rgb2gray(img)
img=exposure.equalize_hist(img)
# ---- Grayscale / Local Histogram Equalization
if mode=='L-LHE':
img=color.rgb2gray(img)
img=rank.equalize(img, disk(10))/255.
# ---- Grayscale / Contrast Limited Adaptive Histogram Equalization (CLAHE)
if mode=='L-CLAHE':
img=color.rgb2gray(img)
img=exposure.equalize_adapthist(img)
# ---- Add image in list of list
out.append(img)
ooo.update_progress('Enhancement: ',len(out),len(images))
# ---- Reshape images
# (-1, width,height,1) for L
# (-1, width,height,3) for RGB
#
out = np.array(out,dtype='float64')
out = out.reshape(-1,width,height,lz)
return out
```
%% Cell type:markdown id: tags:
### 6.2/ To get an idea of the different recipes
%% Cell type:code id: tags:
``` python
i=random.randint(0,len(x_train)-16)
x_samples = x_train[i:i+16]
y_samples = y_train[i:i+16]
datasets = {}
datasets['RGB'] = images_enhancement( x_samples, width=25, height=25, mode='RGB' )
datasets['RGB-HE'] = images_enhancement( x_samples, width=25, height=25, mode='RGB-HE' )
datasets['L'] = images_enhancement( x_samples, width=25, height=25, mode='L' )
datasets['L-HE'] = images_enhancement( x_samples, width=25, height=25, mode='L-HE' )
datasets['L-LHE'] = images_enhancement( x_samples, width=25, height=25, mode='L-LHE' )
datasets['L-CLAHE'] = images_enhancement( x_samples, width=25, height=25, mode='L-CLAHE' )
print('\nEXPECTED (Meta) :\n')
x_expected=[ x_meta[i] for i in y_samples]
ooo.plot_images(x_expected, y_samples, range(16), columns=16, x_size=1, y_size=1, colorbar=False, y_pred=None, cm='binary')
print('\nORIGINAL IMAGES :\n')
ooo.plot_images(x_samples, y_samples, range(16), columns=16, x_size=1, y_size=1, colorbar=False, y_pred=None, cm='binary')
print('\nENHANCED :\n')
for k,d in datasets.items():
print("dataset : {} min,max=[{:.3f},{:.3f}] shape={}".format(k,d.min(),d.max(), d.shape))
ooo.plot_images(d, y_samples, range(16), columns=16, x_size=1, y_size=1, colorbar=False, y_pred=None, cm='binary')
```
%% Cell type:markdown id: tags:
### 6.3/ Cook and save
A function to save a dataset
%% Cell type:code id: tags:
``` python
def save_h5_dataset(x_train, y_train, x_test, y_test, x_meta,y_meta, h5name):
# ---- Filename
filename='./data/'+h5name
# ---- Create h5 file
with h5py.File(filename, "w") as f:
f.create_dataset("x_train", data=x_train)
f.create_dataset("y_train", data=y_train)
f.create_dataset("x_test", data=x_test)
f.create_dataset("y_test", data=y_test)
f.create_dataset("x_meta", data=x_meta)
f.create_dataset("y_meta", data=y_meta)
# ---- done
size=os.path.getsize(filename)/(1024*1024)
print('Dataset : {:24s} shape : {:22s} size : {:6.1f} Mo (saved)\n'.format(filename, str(x_train.shape),size))
```
%% Cell type:markdown id: tags:
Create enhanced datasets, and save them...
Will take about 7-8'
%% Cell type:code id: tags:
``` python
%%time
for s in [24, 48]:
for m in ['RGB', 'RGB-HE', 'L', 'L-LHE']:
# ---- A nice dataset name
name='set-{}x{}-{}.h5'.format(s,s,m)
print("\nDataset : ",name)
# ---- Enhancement
x_train_new = images_enhancement( x_train, width=s, height=s, mode=m )
x_test_new = images_enhancement( x_test, width=s, height=s, mode=m )
x_meta_new = images_enhancement( x_meta, width=s, height=s, mode='RGB' )
# ---- Save
save_h5_dataset( x_train_new, y_train, x_test_new, y_test, x_meta_new,y_meta, name)
x_train_new,x_test_new=0,0
```
%% Cell type:markdown id: tags:
## 7/ Reload data to be sure ;-)
%% Cell type:code id: tags:
``` python
%%time
dataset='set-48x48-L'
samples=range(24)
with h5py.File('./data/'+dataset+'.h5') as f:
x_tmp = f['x_train'][:]
y_tmp = f['y_train'][:]
print("dataset loaded from h5 file.")
ooo.plot_images(x_tmp,y_tmp, samples, columns=8, x_size=2, y_size=2, colorbar=False, y_pred=None, cm='binary')
x_tmp,y_tmp=0,0
```
%% Cell type:markdown id: tags:
----
That's all folks !
%% Cell type:markdown id: tags:
German Traffic Sign Recognition Benchmark (GTSRB)
=================================================
---
Introduction au Deep Learning (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
## Episode 2 : First Convolutions
Our main steps:
- Read H5 dataset
- Build a model
- Train the model
- Evaluate the model
## 1/ Import and init
%% Cell type:code id: tags:
``` python
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.callbacks import TensorBoard
import numpy as np
import matplotlib.pyplot as plt
import h5py
import os,time,sys
from importlib import reload
sys.path.append('..')
import fidle.pwk as ooo
ooo.init()
```
%% Cell type:markdown id: tags:
## 2/ Load dataset
We're going to retrieve a previously recorded dataset.
For example: set-24x24-L
%% Cell type:code id: tags:
``` python
%%time
def read_dataset(name):
'''Reads h5 dataset from ./data
Arguments: dataset name, without .h5
Returns: x_train,y_train,x_test,y_test data'''
# ---- Read dataset
filename='./data/'+name+'.h5'
with h5py.File(filename) as f:
x_train = f['x_train'][:]
y_train = f['y_train'][:]
x_test = f['x_test'][:]
y_test = f['y_test'][:]
# ---- done
print('Dataset "{}" is loaded. ({:.1f} Mo)\n'.format(name,os.path.getsize(filename)/(1024*1024)))
return x_train,y_train,x_test,y_test
x_train,y_train,x_test,y_test = read_dataset('set-24x24-L')
```
%% Cell type:markdown id: tags:
## 3/ Have a look to the dataset
We take a quick look as we go by...
%% Cell type:code id: tags:
``` python
print("x_train : ", x_train.shape)
print("y_train : ", y_train.shape)
print("x_test : ", x_test.shape)
print("y_test : ", y_test.shape)
ooo.plot_images(x_train, y_train, range(12), columns=6, x_size=2, y_size=2)
ooo.plot_images(x_train, y_train, range(36), columns=12, x_size=1, y_size=1)
```
%% Cell type:markdown id: tags:
## 4/ Create model
We will now build a model and train it...
Some models :
%% Cell type:code id: tags:
``` python
# A basic model
#
def get_model_v1(lx,ly,lz):
model = keras.models.Sequential()
model.add( keras.layers.Conv2D(96, (3,3), activation='relu', input_shape=(lx,ly,lz)))
model.add( keras.layers.MaxPooling2D((2, 2)))
model.add( keras.layers.Dropout(0.2))
model.add( keras.layers.Conv2D(192, (3, 3), activation='relu'))
model.add( keras.layers.MaxPooling2D((2, 2)))
model.add( keras.layers.Dropout(0.2))
model.add( keras.layers.Flatten())
model.add( keras.layers.Dense(1500, activation='relu'))
model.add( keras.layers.Dropout(0.5))
model.add( keras.layers.Dense(43, activation='softmax'))
return model
# A more sophisticated model
#
def get_model_v2(lx,ly,lz):
model = keras.models.Sequential()
model.add( keras.layers.Conv2D(64, (3, 3), padding='same', input_shape=(lx,ly,lz), activation='relu'))
model.add( keras.layers.Conv2D(64, (3, 3), activation='relu'))
model.add( keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add( keras.layers.Dropout(0.2))
model.add( keras.layers.Conv2D(128, (3, 3), padding='same', activation='relu'))
model.add( keras.layers.Conv2D(128, (3, 3), activation='relu'))
model.add( keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add( keras.layers.Dropout(0.2))
model.add( keras.layers.Conv2D(256, (3, 3), padding='same',activation='relu'))
model.add( keras.layers.Conv2D(256, (3, 3), activation='relu'))
model.add( keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add( keras.layers.Dropout(0.2))
model.add( keras.layers.Flatten())
model.add( keras.layers.Dense(512, activation='relu'))
model.add( keras.layers.Dropout(0.5))
model.add( keras.layers.Dense(43, activation='softmax'))
return model
# My sphisticated model, but small and fast
#
def get_model_v3(lx,ly,lz):
model = keras.models.Sequential()
model.add( keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(lx,ly,lz)))
model.add( keras.layers.MaxPooling2D((2, 2)))
model.add( keras.layers.Dropout(0.5))
model.add( keras.layers.Conv2D(64, (3, 3), activation='relu'))
model.add( keras.layers.MaxPooling2D((2, 2)))
model.add( keras.layers.Dropout(0.5))
model.add( keras.layers.Conv2D(128, (3, 3), activation='relu'))
model.add( keras.layers.MaxPooling2D((2, 2)))
model.add( keras.layers.Dropout(0.5))
model.add( keras.layers.Conv2D(256, (3, 3), activation='relu'))
model.add( keras.layers.MaxPooling2D((2, 2)))
model.add( keras.layers.Dropout(0.5))
model.add( keras.layers.Flatten())
model.add( keras.layers.Dense(1152, activation='relu'))
model.add( keras.layers.Dropout(0.5))
model.add( keras.layers.Dense(43, activation='softmax'))
return model
```
%% Cell type:markdown id: tags:
## 5/ Train the model
**Get the shape of my data :**
%% Cell type:code id: tags:
``` python
(n,lx,ly,lz) = x_train.shape
print("Images of the dataset have this folowing shape : ",(lx,ly,lz))
```
%% Cell type:markdown id: tags:
**Get and compile a model, with the data shape :**
%% Cell type:code id: tags:
``` python
model = get_model_v1(lx,ly,lz)
model.summary()
model.compile(optimizer = 'adam',
loss = 'sparse_categorical_crossentropy',
metrics = ['accuracy'])
```
%% Cell type:markdown id: tags:
**Train it :**
%% Cell type:code id: tags:
``` python
%%time
batch_size = 64
epochs = 5
# ---- Shuffle train data
x_train,y_train=ooo.shuffle_np_dataset(x_train,y_train)
# ---- Train
history = model.fit( x_train, y_train,
batch_size = batch_size,
epochs = epochs,
verbose = 1,
validation_data = (x_test, y_test))
```
%% Cell type:markdown id: tags:
**Evaluate it :**
%% Cell type:code id: tags:
``` python
max_val_accuracy = max(history.history["val_accuracy"])
print("Max validation accuracy is : {:.4f}".format(max_val_accuracy))
```
%% Cell type:code id: tags:
``` python
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss : {:5.4f}'.format(score[0]))
print('Test accuracy : {:5.4f}'.format(score[1]))
```
%% Cell type:markdown id: tags:
German Traffic Sign Recognition Benchmark (GTSRB)
=================================================
---
Introduction au Deep Learning (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
## Episode 3 : Tracking, visualizing and save models
Our main steps:
- Monitoring and understanding our model training
- Add recovery points
- Analyze the results
- Restore and run recovery pont
## 1/ Import and init
%% Cell type:code id: tags:
``` python
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.callbacks import TensorBoard
import numpy as np
import h5py
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sn
import os, sys, time, random
from importlib import reload
sys.path.append('..')
import fidle.pwk as ooo
ooo.init()
```
%% Cell type:markdown id: tags:
## 2/ Load dataset
Dataset is one of the saved dataset: RGB25, RGB35, L25, L35, etc.
First of all, we're going to use a smart dataset : **set-24x24-L**
(with a GPU, it only takes 35'' compared to more than 5' with a CPU !)
%% Cell type:code id: tags:
``` python
%%time
def read_dataset(name):
'''Reads h5 dataset from ./data
Arguments: dataset name, without .h5
Returns: x_train,y_train,x_test,y_test data'''
# ---- Read dataset
filename='./data/'+name+'.h5'
with h5py.File(filename) as f:
x_train = f['x_train'][:]
y_train = f['y_train'][:]
x_test = f['x_test'][:]
y_test = f['y_test'][:]
x_meta = f['x_meta'][:]
y_meta = f['y_meta'][:]
# ---- done
print('Dataset "{}" is loaded. ({:.1f} Mo)\n'.format(name,os.path.getsize(filename)/(1024*1024)))
return x_train,y_train,x_test,y_test,x_meta,y_meta
x_train,y_train,x_test,y_test,x_meta,y_meta = read_dataset('set-24x24-L')
```
%% Cell type:markdown id: tags:
## 3/ Have a look to the dataset
Note: Data must be reshape for matplotlib
%% Cell type:code id: tags:
``` python
print("x_train : ", x_train.shape)
print("y_train : ", y_train.shape)
print("x_test : ", x_test.shape)
print("y_test : ", y_test.shape)
ooo.plot_images(x_train, y_train, range(12), columns=6, x_size=2, y_size=2)
ooo.plot_images(x_train, y_train, range(36), columns=12, x_size=1, y_size=1)
```
%% Cell type:markdown id: tags:
## 4/ Create model
We will now build a model and train it...
Some models...
%% Cell type:code id: tags:
``` python
# A basic model
#
def get_model_v1(lx,ly,lz):
model = keras.models.Sequential()
model.add( keras.layers.Conv2D(96, (3,3), activation='relu', input_shape=(lx,ly,lz)))
model.add( keras.layers.MaxPooling2D((2, 2)))
model.add( keras.layers.Dropout(0.2))
model.add( keras.layers.Conv2D(192, (3, 3), activation='relu'))
model.add( keras.layers.MaxPooling2D((2, 2)))
model.add( keras.layers.Dropout(0.2))
model.add( keras.layers.Flatten())
model.add( keras.layers.Dense(1500, activation='relu'))
model.add( keras.layers.Dropout(0.5))
model.add( keras.layers.Dense(43, activation='softmax'))
return model
```
%% Cell type:markdown id: tags:
## 5/ Prepare callbacks
We will add 2 callbacks :
- **TensorBoard**
Training logs, which can be visualised with Tensorboard.
`#tensorboard --logdir ./run/logs`
IMPORTANT : Relancer tensorboard à chaque run
- **Model backup**
It is possible to save the model each xx epoch or at each improvement.
The model can be saved completely or partially (weight).
For full format, we can use HDF5 format.
%% Cell type:raw id: tags:
%%bash
# To clean old logs and saved model, run this cell
#
/bin/rm -r ./run/logs 2>/dev/null
/bin/rm -r ./run/models 2>/dev/null
/bin/mkdir -p -m 755 ./run/logs
/bin/mkdir -p -m 755 ./run/models
echo -e "Reset directories : ./run/logs and ./run/models ."
%% Cell type:code id: tags:
``` python
ooo.mkdir('./run/models')
ooo.mkdir('./run/logs')
# ---- Callback tensorboard
log_dir = "./run/logs/tb_" + ooo.tag_now()
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
# ---- Callback ModelCheckpoint - Save best model
save_dir = "./run/models/best-model.h5"
bestmodel_callback = tf.keras.callbacks.ModelCheckpoint(filepath=save_dir, verbose=0, monitor='accuracy', save_best_only=True)
# ---- Callback ModelCheckpoint - Save model each epochs
save_dir = "./run/models/model-{epoch:04d}.h5"
savemodel_callback = tf.keras.callbacks.ModelCheckpoint(filepath=save_dir, verbose=0, save_freq=2000*5)
```
%% Cell type:markdown id: tags:
## 5/ Train the model
**Get the shape of my data :**
%% Cell type:code id: tags:
``` python
(n,lx,ly,lz) = x_train.shape
print("Images of the dataset have this folowing shape : ",(lx,ly,lz))
```
%% Cell type:markdown id: tags:
**Get and compile a model, with the data shape :**
%% Cell type:code id: tags:
``` python
model = get_model_v1(lx,ly,lz)
# model.summary()
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
```
%% Cell type:markdown id: tags:
**Train it :**
Note: The training curve is visible in real time with Tensorboard :
`#tensorboard --logdir ./run/logs`
%% Cell type:code id: tags:
``` python
%%time
batch_size = 64
epochs = 30
# ---- Shuffle train data
x_train,y_train=ooo.shuffle_np_dataset(x_train,y_train)
# ---- Train
# Note: To be faster in our example, we can take only 2000 values
#
history = model.fit( x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test),
callbacks=[tensorboard_callback, bestmodel_callback, savemodel_callback] )
model.save('./run/models/last-model.h5')
```
%% Cell type:markdown id: tags:
**Evaluate it :**
%% Cell type:code id: tags:
``` python
max_val_accuracy = max(history.history["val_accuracy"])
print("Max validation accuracy is : {:.4f}".format(max_val_accuracy))
```
%% Cell type:code id: tags:
``` python
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss : {:5.4f}'.format(score[0]))
print('Test accuracy : {:5.4f}'.format(score[1]))
```
%% Cell type:markdown id: tags:
## 6/ History
The return of model.fit() returns us the learning history
%% Cell type:code id: tags:
``` python
ooo.plot_history(history)
```
%% Cell type:markdown id: tags:
## 7/ Evaluation and confusion
%% Cell type:code id: tags:
``` python
y_pred = model.predict_classes(x_test)
conf_mat = confusion_matrix(y_test,y_pred, normalize="true", labels=range(43))
ooo.plot_confusion_matrix(conf_mat)
```
%% Cell type:markdown id: tags:
## 8/ Restore and evaluate
### 8.1/ List saved models :
%% Cell type:code id: tags:
``` python
!find ./run/models/
```
%% Cell type:markdown id: tags:
### 8.2/ Restore a model :
%% Cell type:code id: tags:
``` python
loaded_model = tf.keras.models.load_model('./run/models/best-model.h5')
# loaded_model.summary()
print("Loaded.")
```
%% Cell type:markdown id: tags:
### 8.3/ Evaluate it :
%% Cell type:code id: tags:
``` python
score = loaded_model.evaluate(x_test, y_test, verbose=0)
print('Test loss : {:5.4f}'.format(score[0]))
print('Test accuracy : {:5.4f}'.format(score[1]))
```
%% Cell type:markdown id: tags:
### 8.4/ Make a prediction :
%% Cell type:code id: tags:
``` python
# ---- Get a random image
#
i = random.randint(1,len(x_test))
x,y = x_test[i], y_test[i]
# ---- Do prediction
#
predictions = loaded_model.predict( np.array([x]) )
# ---- A prediction is just the output layer
#
print("\nOutput layer from model is (x100) :\n")
with np.printoptions(precision=2, suppress=True, linewidth=95):
print(predictions*100)
# ---- Graphic visualisation
#
print("\nGraphically :\n")
plt.figure(figsize=(12,2))
plt.bar(range(43), predictions[0], align='center', alpha=0.5)
plt.ylabel('Probability')
plt.ylim((0,1))
plt.xlabel('Class')
plt.title('Trafic Sign prediction')
plt.show()
# ---- Predict class
#
p = np.argmax(predictions)
# ---- Show result
#
print("\nPrediction on the left, real stuff on the right :\n")
ooo.plot_images([x,x_meta[y]], [p,y], range(2), columns=3, x_size=3, y_size=2)
if p==y:
print("YEEES ! that's right!")
else:
print("oups, that's wrong ;-(")
```
%% Cell type:markdown id: tags:
---
That's all folks !
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
German Traffic Sign Recognition Benchmark (GTSRB)
=================================================
---
Introduction au Deep Learning (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
## Episode 4 : Data augmentation
Our main steps:
- Increase and improve the learning dataset
## 1/ Import and init
%% Cell type:code id: tags:
``` python
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.callbacks import TensorBoard
import numpy as np
import h5py
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sn
import os, sys, time, random
from importlib import reload
sys.path.append('..')
import fidle.pwk as ooo
ooo.init()
```
%% Cell type:markdown id: tags:
## 2/ Dataset loader
Dataset is one of the saved dataset: RGB25, RGB35, L25, L35, etc.
First of all, we're going to use a smart dataset : **set-24x24-L**
(with a GPU, it only takes 35'' compared to more than 5' with a CPU !)
%% Cell type:code id: tags:
``` python
%%time
def read_dataset(name):
'''Reads h5 dataset from ./data
Arguments: dataset name, without .h5
Returns: x_train,y_train,x_test,y_test data'''
# ---- Read dataset
filename='./data/'+name+'.h5'
with h5py.File(filename) as f:
x_train = f['x_train'][:]
y_train = f['y_train'][:]
x_test = f['x_test'][:]
y_test = f['y_test'][:]
# ---- done
print('Dataset "{}" is loaded. ({:.1f} Mo)\n'.format(name,os.path.getsize(filename)/(1024*1024)))
return x_train,y_train,x_test,y_test
```
%% Cell type:markdown id: tags:
## 3/ Models
We will now build a model and train it...
This is my model ;-)
%% Cell type:code id: tags:
``` python
# A basic model
#
def get_model_v1(lx,ly,lz):
model = keras.models.Sequential()
model.add( keras.layers.Conv2D(96, (3,3), activation='relu', input_shape=(lx,ly,lz)))
model.add( keras.layers.MaxPooling2D((2, 2)))
model.add( keras.layers.Dropout(0.2))
model.add( keras.layers.Conv2D(192, (3, 3), activation='relu'))
model.add( keras.layers.MaxPooling2D((2, 2)))
model.add( keras.layers.Dropout(0.2))
model.add( keras.layers.Flatten())
model.add( keras.layers.Dense(1500, activation='relu'))
model.add( keras.layers.Dropout(0.5))
model.add( keras.layers.Dense(43, activation='softmax'))
return model
```
%% Cell type:markdown id: tags:
## 4/ Callbacks
We prepare 2 kind callbacks : TensorBoard and Model backup
%% Cell type:code id: tags:
``` python
%%bash
# To clean old logs and saved model, run this cell
#
/bin/rm -r ./run/logs 2>/dev/null
/bin/rm -r ./run/models 2>/dev/null
/bin/mkdir -p -m 755 ./run/logs
/bin/mkdir -p -m 755 ./run/models
echo -e "Reset directories : ./run/logs and ./run/models ."
```
%% Cell type:code id: tags:
``` python
# ---- Callback tensorboard
log_dir = "./run/logs/tb_" + ooo.tag_now()
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
# ---- Callback ModelCheckpoint - Save best model
save_dir = "./run/models/best-model.h5"
bestmodel_callback = tf.keras.callbacks.ModelCheckpoint(filepath=save_dir, verbose=0, monitor='accuracy', save_best_only=True)
# ---- Callback ModelCheckpoint - Save model each epochs
save_dir = "./run/models/model-{epoch:04d}.h5"
savemodel_callback = tf.keras.callbacks.ModelCheckpoint(filepath=save_dir, verbose=0, save_freq=2000*5)
```
%% Cell type:markdown id: tags:
## 5/ Load and prepare dataset
### 5.1/ Load
%% Cell type:code id: tags:
``` python
x_train,y_train,x_test,y_test = read_dataset('set-48x48-L-LHE')
```
%% Cell type:markdown id: tags:
### 5.2/ Data augmentation
%% Cell type:code id: tags:
``` python
datagen = keras.preprocessing.image.ImageDataGenerator(featurewise_center=False,
featurewise_std_normalization=False,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.2,
shear_range=0.1,
rotation_range=10.)
datagen.fit(x_train)
```
%% Cell type:markdown id: tags:
## 6/ Train the model
**Get the shape of my data :**
%% Cell type:code id: tags:
``` python
(n,lx,ly,lz) = x_train.shape
print("Images of the dataset have this folowing shape : ",(lx,ly,lz))
```
%% Cell type:markdown id: tags:
**Get and compile a model, with the data shape :**
%% Cell type:code id: tags:
``` python
model = get_model_v3(lx,ly,lz)
# model.summary()
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
```
%% Cell type:markdown id: tags:
**Train it :**
Note : La courbe d'apprentissage est visible en temps réel avec Tensorboard :
`#tensorboard --logdir ./run/logs`
%% Cell type:code id: tags:
``` python
%%time
batch_size = 64
epochs = 30
# ---- Shuffle train data
#x_train,y_train=ooo.shuffle_np_dataset(x_train,y_train)
# ---- Train
#
history = model.fit( datagen.flow(x_train, y_train, batch_size=batch_size),
steps_per_epoch = int(x_train.shape[0]/batch_size),
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test),
callbacks=[tensorboard_callback, bestmodel_callback, savemodel_callback] )
model.save('./run/models/last-model.h5')
```
%% Cell type:markdown id: tags:
**Evaluate it :**
%% Cell type:code id: tags:
``` python
max_val_accuracy = max(history.history["val_accuracy"])
print("Max validation accuracy is : {:.4f}".format(max_val_accuracy))
```
%% Cell type:code id: tags:
``` python
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss : {:5.4f}'.format(score[0]))
print('Test accuracy : {:5.4f}'.format(score[1]))
```
%% Cell type:markdown id: tags:
## 7/ History
The return of model.fit() returns us the learning history
%% Cell type:code id: tags:
``` python
ooo.plot_history(history)
```
%% Cell type:markdown id: tags:
## 8/ Evaluate best model
%% Cell type:markdown id: tags:
### 8.1/ Restore best model :
%% Cell type:code id: tags:
``` python
loaded_model = tf.keras.models.load_model('./run/models/best-model.h5')
# best_model.summary()
print("Loaded.")
```
%% Cell type:markdown id: tags:
### 8.2/ Evaluate it :
%% Cell type:code id: tags:
``` python
score = loaded_model.evaluate(x_test, y_test, verbose=0)
print('Test loss : {:5.4f}'.format(score[0]))
print('Test accuracy : {:5.4f}'.format(score[1]))
```
%% Cell type:markdown id: tags:
**Plot confusion matrix**
%% Cell type:code id: tags:
``` python
y_pred = model.predict_classes(x_test)
conf_mat = confusion_matrix(y_test,y_pred, normalize="true", labels=range(43))
ooo.plot_confusion_matrix(conf_mat)
```
%% Cell type:markdown id: tags:
---
That's all folks !
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
German Traffic Sign Recognition Benchmark (GTSRB)
=================================================
---
Introduction au Deep Learning (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
## Episode 5 : Full Convolutions
Our main steps:
- Try n models with n datasets
- Save a Pandas/h5 report
- Write to be run in batch mode
## 1/ Import
%% Cell type:code id: tags:
``` python
import tensorflow as tf
from tensorflow import keras
import numpy as np
import h5py
import os,time,json
import random
from IPython.display import display
VERSION='1.6'
```
%% Cell type:markdown id: tags:
## 2/ Init and start
%% Cell type:code id: tags:
``` python
# ---- Where I am ?
now = time.strftime("%A %d %B %Y - %Hh%Mm%Ss")
here = os.getcwd()
random.seed(time.time())
tag_id = '{:06}'.format(random.randint(0,99999))
# ---- Who I am ?
if 'OAR_JOB_ID' in os.environ:
oar_id=os.environ['OAR_JOB_ID']
else:
oar_id='???'
print('\nFull Convolutions Notebook')
print(' Version : {}'.format(VERSION))
print(' Now is : {}'.format(now))
print(' OAR id : {}'.format(oar_id))
print(' Tag id : {}'.format(tag_id))
print(' Working directory : {}'.format(here))
print(' TensorFlow version :',tf.__version__)
print(' Keras version :',tf.keras.__version__)
print(' for tensorboard : --logdir {}/run/logs_{}'.format(here,tag_id))
```
%% Cell type:markdown id: tags:
## 3/ Dataset loading
%% Cell type:code id: tags:
``` python
def read_dataset(name):
'''Reads h5 dataset from ./data
Arguments: dataset name, without .h5
Returns: x_train,y_train,x_test,y_test data'''
# ---- Read dataset
filename='./data/'+name+'.h5'
with h5py.File(filename,'r') as f:
x_train = f['x_train'][:]
y_train = f['y_train'][:]
x_test = f['x_test'][:]
y_test = f['y_test'][:]
return x_train,y_train,x_test,y_test
```
%% Cell type:markdown id: tags:
## 4/ Models collection
%% Cell type:code id: tags:
``` python
# A basic model
#
def get_model_v1(lx,ly,lz):
model = keras.models.Sequential()
model.add( keras.layers.Conv2D(96, (3,3), activation='relu', input_shape=(lx,ly,lz)))
model.add( keras.layers.MaxPooling2D((2, 2)))
model.add( keras.layers.Dropout(0.2))
model.add( keras.layers.Conv2D(192, (3, 3), activation='relu'))
model.add( keras.layers.MaxPooling2D((2, 2)))
model.add( keras.layers.Dropout(0.2))
model.add( keras.layers.Flatten())
model.add( keras.layers.Dense(1500, activation='relu'))
model.add( keras.layers.Dropout(0.5))
model.add( keras.layers.Dense(43, activation='softmax'))
return model
# A more sophisticated model
#
def get_model_v2(lx,ly,lz):
model = keras.models.Sequential()
model.add( keras.layers.Conv2D(64, (3, 3), padding='same', input_shape=(lx,ly,lz), activation='relu'))
model.add( keras.layers.Conv2D(64, (3, 3), activation='relu'))
model.add( keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add( keras.layers.Dropout(0.2))
model.add( keras.layers.Conv2D(128, (3, 3), padding='same', activation='relu'))
model.add( keras.layers.Conv2D(128, (3, 3), activation='relu'))
model.add( keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add( keras.layers.Dropout(0.2))
model.add( keras.layers.Conv2D(256, (3, 3), padding='same',activation='relu'))
model.add( keras.layers.Conv2D(256, (3, 3), activation='relu'))
model.add( keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add( keras.layers.Dropout(0.2))
model.add( keras.layers.Flatten())
model.add( keras.layers.Dense(512, activation='relu'))
model.add( keras.layers.Dropout(0.5))
model.add( keras.layers.Dense(43, activation='softmax'))
return model
def get_model_v3(lx,ly,lz):
model = keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(32, (5, 5), padding='same', activation='relu', input_shape=(lx,ly,lz)))
model.add(tf.keras.layers.BatchNormalization(axis=-1))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Conv2D(64, (5, 5), padding='same', activation='relu'))
model.add(tf.keras.layers.BatchNormalization(axis=-1))
model.add(tf.keras.layers.Conv2D(128, (5, 5), padding='same', activation='relu'))
model.add(tf.keras.layers.BatchNormalization(axis=-1))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Dropout(0.2))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(512, activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dropout(0.4))
model.add(tf.keras.layers.Dense(43, activation='softmax'))
return model
```
%% Cell type:markdown id: tags:
## 5/ Multiple datasets, multiple models ;-)
%% Cell type:code id: tags:
``` python
def multi_run(datasets, models, datagen=None,
train_size=1, test_size=1, batch_size=64, epochs=16,
verbose=0, extension_dir='last'):
# ---- Logs and models dir
#
os.makedirs('./run/logs_{}'.format(extension_dir), mode=0o750, exist_ok=True)
os.makedirs('./run/models_{}'.format(extension_dir), mode=0o750, exist_ok=True)
# ---- Columns of output
#
output={}
output['Dataset']=[]
output['Size'] =[]
for m in models:
output[m+'_Accuracy'] = []
output[m+'_Duration'] = []
# ---- Let's go
#
for d_name in datasets:
print("\nDataset : ",d_name)
# ---- Read dataset
x_train,y_train,x_test,y_test = read_dataset(d_name)
d_size=os.path.getsize('./data/'+d_name+'.h5')/(1024*1024)
output['Dataset'].append(d_name)
output['Size'].append(d_size)
# ---- Get the shape
(n,lx,ly,lz) = x_train.shape
n_train = int(x_train.shape[0]*train_size)
n_test = int(x_test.shape[0]*test_size)
# ---- For each model
for m_name,m_function in models.items():
print(" Run model {} : ".format(m_name), end='')
# ---- get model
try:
model=m_function(lx,ly,lz)
# ---- Compile it
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# ---- Callbacks tensorboard
log_dir = "./run/logs_{}/tb_{}_{}".format(extension_dir, d_name, m_name)
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
# ---- Callbacks bestmodel
save_dir = "./run/models_{}/model_{}_{}.h5".format(extension_dir, d_name, m_name)
bestmodel_callback = tf.keras.callbacks.ModelCheckpoint(filepath=save_dir, verbose=0, monitor='accuracy', save_best_only=True)
# ---- Train
start_time = time.time()
if datagen==None:
# ---- No data augmentation (datagen=None) --------------------------------------
history = model.fit(x_train[:n_train], y_train[:n_train],
batch_size = batch_size,
epochs = epochs,
verbose = verbose,
validation_data = (x_test[:n_test], y_test[:n_test]),
callbacks = [tensorboard_callback, bestmodel_callback])
else:
# ---- Data augmentation (datagen given) ----------------------------------------
datagen.fit(x_train)
history = model.fit(datagen.flow(x_train, y_train, batch_size=batch_size),
steps_per_epoch = int(n_train/batch_size),
epochs = epochs,
verbose = verbose,
validation_data = (x_test[:n_test], y_test[:n_test]),
callbacks = [tensorboard_callback, bestmodel_callback])
# ---- Result
end_time = time.time()
duration = end_time-start_time
accuracy = max(history.history["val_accuracy"])*100
#
output[m_name+'_Accuracy'].append(accuracy)
output[m_name+'_Duration'].append(duration)
print("Accuracy={:.2f} and Duration={:.2f})".format(accuracy,duration))
except:
output[m_name+'_Accuracy'].append('0')
output[m_name+'_Duration'].append('999')
print('-')
return output
```
%% Cell type:markdown id: tags:
## 6/ Run !
%% Cell type:code id: tags:
``` python
start_time = time.time()
print('\n---- Run','-'*50)
# --------- Datasets, models, and more.. -----------------------------------
#
# ---- For tests
# datasets = ['set-24x24-L', 'set-24x24-RGB']
# models = {'v1':get_model_v1, 'v4':get_model_v2}
# batch_size = 64
# epochs = 2
# train_size = 0.1
# test_size = 0.1
# with_datagen = False
# verbose = 0
#
# ---- All possibilities -> Run A
# datasets = ['set-24x24-L', 'set-24x24-RGB', 'set-48x48-L', 'set-48x48-RGB', 'set-24x24-L-LHE', 'set-24x24-RGB-HE', 'set-48x48-L-LHE', 'set-48x48-RGB-HE']
# models = {'v1':get_model_v1, 'v2':get_model_v2, 'v3':get_model_v3}
# batch_size = 64
# epochs = 16
# train_size = 1
# test_size = 1
# with_datagen = False
# verbose = 0
#
# ---- Data augmentation -> Run B
datasets = ['set-48x48-RGB']
models = {'v2':get_model_v2}
batch_size = 64
epochs = 20
train_size = 1
test_size = 1
with_datagen = True
verbose = 0
#
# ---------------------------------------------------------------------------
# ---- Data augmentation
#
if with_datagen :
datagen = keras.preprocessing.image.ImageDataGenerator(featurewise_center=False,
featurewise_std_normalization=False,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.2,
shear_range=0.1,
rotation_range=10.)
else:
datagen=None
# ---- Run
#
output = multi_run(datasets, models,
datagen=datagen,
train_size=train_size, test_size=test_size,
batch_size=batch_size, epochs=epochs,
verbose=verbose,
extension_dir=tag_id)
# ---- Save report
#
report={}
report['output']=output
report['description']='train_size={} test_size={} batch_size={} epochs={} data_aug={}'.format(train_size,test_size,batch_size,epochs,with_datagen)
report_name='./run/report_{}.json'.format(tag_id)
with open(report_name, 'w') as file:
json.dump(report, file)
print('\nReport saved as ',report_name)
end_time = time.time()
duration = end_time-start_time
print('Duration : {} s'.format(duration))
print('-'*59)
```
%% Cell type:markdown id: tags:
## 7/ That's all folks..
%% Cell type:code id: tags:
``` python
print('\n{}'.format(time.strftime("%A %-d %B %Y, %H:%M:%S")))
print("The work is done.\n")
```
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
German Traffic Sign Recognition Benchmark (GTSRB)
=================================================
---
Introduction au Deep Learning (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
## Episode 5.1 : Full Convolutions / run
Our main steps:
- Run Full-convolution.ipynb as a batch :
- Notebook mode
- Script mode
- Tensorboard follow up
## 1/ Run a notebook as a batch
To run a notebook :
```jupyter nbconvert --to notebook --execute <notebook>```
%% Cell type:raw id: tags:
%%bash
# ---- This will execute and save a notebook
#
jupyter nbconvert --ExecutePreprocessor.timeout=-1 --to notebook --output='./run/full_convolutions' --execute '05-Full-convolutions.ipynb'
%% Cell type:markdown id: tags:
## 2/ Export as a script (better choice)
To export a notebook as a script :
```jupyter nbconvert --to script <notebook>```
To run the script :
```ipython <script>```
%% Cell type:code id: tags:
``` python
%%bash
# ---- This will convert a notebook to a notebook.py script
#
jupyter nbconvert --to script --output='./run/full_convolutions_B' '05-Full-convolutions.ipynb'
```
%% Output
[NbConvertApp] Converting notebook 05-Full-convolutions.ipynb to script
[NbConvertApp] Writing 11305 bytes to ./run/full_convolutions_B.py
%% Cell type:code id: tags:
``` python
!ls -l ./run/*.py
```
%% Output
-rw-r--r-- 1 pjluc pjluc 11305 Jan 21 00:13 ./run/full_convolutions_B.py
%% Cell type:markdown id: tags:
## 3/ Batch submission
Create batch script :
%% Cell type:code id: tags:
``` python
%%writefile "./run/batch_full_convolutions_B.sh"
#!/bin/bash
#OAR -n Full convolutions
#OAR -t gpu
#OAR -l /nodes=1/gpudevice=1,walltime=01:00:00
#OAR --stdout _batch/full_convolutions_%jobid%.out
#OAR --stderr _batch/full_convolutions_%jobid%.err
#OAR --project deeplearningshs
#---- For cpu
# use :
# OAR -l /nodes=1/core=32,walltime=01:00:00
# and add a 2>/dev/null to ipython xxx
# ----------------------------------
# _ _ _
# | |__ __ _| |_ ___| |__
# | '_ \ / _` | __/ __| '_ \
# | |_) | (_| | || (__| | | |
# |_.__/ \__,_|\__\___|_| |_|
# Full convolutions
# ----------------------------------
#
CONDA_ENV=deeplearning2
RUN_DIR=~/fidle/GTSRB
RUN_SCRIPT=./run/full_convolutions_B.py
# ---- Cuda Conda initialization
#
echo '------------------------------------------------------------'
echo "Start : $0"
echo '------------------------------------------------------------'
#
source /applis/environments/cuda_env.sh dahu 10.0
source /applis/environments/conda.sh
#
conda activate "$CONDA_ENV"
# ---- Run it...
#
cd $RUN_DIR
ipython $RUN_SCRIPT
```
%% Output
Writing ./run/batch_full_convolutions_B.sh
%% Cell type:code id: tags:
``` python
%%bash
chmod 755 ./run/*.sh
chmod 755 ./run/*.py
ls -l ./run/*full_convolutions*
```
%% Output
-rwxr-xr-x 1 pjluc pjluc 1045 Jan 21 00:15 ./run/batch_full_convolutions_B.sh
-rwxr-xr-x 1 pjluc pjluc 611 Jan 19 15:53 ./run/batch_full_convolutions.sh
-rwxr-xr-x 1 pjluc pjluc 11305 Jan 21 00:13 ./run/full_convolutions_B.py
%% Cell type:raw id: tags:
%%bash
./run/batch_full_convolutions.sh
%% Cell type:code id: tags:
``` python
```
This diff is collapsed.
%% Cell type:markdown id: tags:
Running Tensorboard from Jupyter lab
====================================
---
Introduction au Deep Learning (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
Vesion : 1.0
%% Cell type:markdown id: tags:
## 1/ Méthode 1 : Shell command
%% Cell type:code id: tags:
``` python
%%bash
tensorboard_start --logdir ./run/logs
```
%% Cell type:code id: tags:
``` python
%%bash
tensorboard_status
```
%% Cell type:code id: tags:
``` python
%%bash
tensorboard_stop
```
%% Cell type:markdown id: tags:
## Méthode 2 : Magic command
**Start**
%% Cell type:code id: tags:
``` python
%load_ext tensorboard
```
%% Cell type:code id: tags:
``` python
%tensorboard --port 21277 --host 0.0.0.0 --logdir ./run/logs
```
%% Cell type:markdown id: tags:
**Stop**
No way... use bash method
## Methode 3 : Tensorboard module
**Start**
%% Cell type:code id: tags:
``` python
import tensorboard.notebook as tsb
```
%% Cell type:code id: tags:
``` python
tsb.start('--port 21277 --host 0.0.0.0 --logdir ./run/logs')
```
%% Cell type:markdown id: tags:
**Check**
%% Cell type:code id: tags:
``` python
a=tsb.list()
```
%% Cell type:markdown id: tags:
**Stop**
No way... use bash method
%% Cell type:code id: tags:
``` python
!kill 214798
```
%% Cell type:code id: tags:
``` python
```
This diff is collapsed.
%% Cell type:markdown id: tags:
German Traffic Sign Recognition Benchmark (GTSRB)
=================================================
---
Introduction au Deep Learning (IDLE)
S. Aria, E. Maldonado, JL. Parouty
CNRS/SARI/DEVLOG - 2020
Objectives of this practical work
---------------------------------
Traffic sign classification with **CNN**, using Tensorflow and **Keras**
About the dataset
-----------------
Name : [German Traffic Sign Recognition Benchmark (GTSRB)](http://benchmark.ini.rub.de/?section=gtsrb)
Available [here](https://sid.erda.dk/public/archives/daaeac0d7ce1152aea9b61d9f1e19370/published-archive.html)
or on **[kaggle](https://www.kaggle.com/meowmeowmeowmeowmeow/gtsrb-german-traffic-sign)**
A nice example from : [Alex Staravoitau](https://navoshta.com/traffic-signs-classification/)
In few words :
- Images : Variable dimensions, rgb
- Train set : 39209 images
- Test set : 12630 images
- Classes : 0 to 42
Episodes
--------
**[01 - Preparation of data](01-Preparation-of-data.ipynb)**
- Understanding the dataset
- Preparing and formatting data
- Organize and backup data
**[02 - First convolutions](02-First-convolutions.ipynb)**
- Read dataset
- Build a model
- Train the model
- Model evaluation
%% Cell type:code id: tags:
``` python
```
File added
This diff is collapsed.
%% Cell type:markdown id: tags:
Text Embedding - IMDB dataset
=============================
---
Introduction au Deep Learning (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
## Reviews analysis :
The objective is to guess whether our new and personals films reviews are **positive or negative** .
For this, we will use our previously saved model.
What we're going to do:
- Preparing the data
- Retrieve our saved model
- Evaluate the result
%% Cell type:markdown id: tags:
## Step 1 - Init python stuff
%% Cell type:code id: tags:
``` python
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.datasets.imdb as imdb
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
import pandas as pd
import os,sys,h5py,json,re
from importlib import reload
sys.path.append('..')
import fidle.pwk as ooo
ooo.init()
```
%% Cell type:markdown id: tags:
## Step 2 : Preparing the data
### 2.1 - Our reviews :
%% Cell type:code id: tags:
``` python
reviews = [ "This film is particularly nice, a must see.",
"Some films are classics and cannot be ignored.",
"This movie is just abominable and doesn't deserve to be seen!"]
```
%% Cell type:markdown id: tags:
### 2.2 - Retrieve dictionaries
%% Cell type:code id: tags:
``` python
with open('./data/word_index.json', 'r') as fp:
word_index = json.load(fp)
index_word = {index:word for word,index in word_index.items()}
```
%% Cell type:markdown id: tags:
### 2.3 - Clean, index and padd
%% Cell type:code id: tags:
``` python
max_len = 256
vocab_size = 10000
nb_reviews = len(reviews)
x_data = []
# ---- For all reviews
for review in reviews:
# ---- First index must be <start>
index_review=[1]
# ---- For all words
for w in review.split(' '):
# ---- Clean it
w_clean = re.sub(r"[^a-zA-Z0-9]", "", w)
# ---- Not empty ?
if len(w_clean)>0:
# ---- Get the index
w_index = word_index.get(w,2)
if w_index>vocab_size : w_index=2
# ---- Add the index if < vocab_size
index_review.append(w_index)
# ---- Add the indexed review
x_data.append(index_review)
# ---- Padding
x_data = keras.preprocessing.sequence.pad_sequences(x_data, value = 0, padding = 'post', maxlen = max_len)
```
%% Cell type:markdown id: tags:
### 2.4 - Have a look
%% Cell type:code id: tags:
``` python
def translate(x):
return ' '.join( [index_word.get(i,'?') for i in x] )
for i in range(nb_reviews):
imax=np.where(x_data[i]==0)[0][0]+5
print(f'\nText review :', reviews[i])
print( f'x_train[{i:}] :', list(x_data[i][:imax]), '(...)')
print( 'Translation :', translate(x_data[i][:imax]), '(...)')
```
%% Cell type:markdown id: tags:
## Step 2 - Bring back the model
%% Cell type:code id: tags:
``` python
model = keras.models.load_model('./run/models/best_model.h5')
```
%% Cell type:markdown id: tags:
## Step 4 - Predict
%% Cell type:code id: tags:
``` python
y_pred = model.predict(x_data)
```
%% Cell type:markdown id: tags:
#### And the winner is :
%% Cell type:code id: tags:
``` python
for i in range(nb_reviews):
print(f'\n{reviews[i]:<70} =>',('NEGATIVE' if y_pred[i][0]<0.5 else 'POSITIVE'),f'({y_pred[i][0]:.2f})')
```
%% Cell type:code id: tags:
``` python
a=[1]+[i for i in range(3)]
a
```
%% Cell type:code id: tags:
``` python
```
This diff is collapsed.
This diff is collapsed.
FIDLE - Formation Introduction au Deep Learning
===============================================
---
S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
## 1/ Environment
To run this examples, you need an environment with the following packages :
- Python 3.6
- numpy
- Tensorflow 2.0
- scikit-image
- scikit-learn
- Matplotlib
- seaborn
- pyplot
You can install such a predefined environment :
```
conda env create -f environment.yml
```
To manage conda environment see [there](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#)
## 4/ Misc
To update an existing environment :
```
conda env update --name=deeplearning2 --file=environment.yml
```
\ No newline at end of file