Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • daconcea/fidle
  • bossardl/fidle
  • Julie.Remenant/fidle
  • abijolao/fidle
  • monsimau/fidle
  • karkars/fidle
  • guilgautier/fidle
  • cailletr/fidle
  • talks/fidle
9 results
Show changes
Showing
with 5943 additions and 0 deletions
%% Cell type:markdown id: tags:
Text Embedding - IMDB dataset
=============================
---
Introduction au Deep Learning (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
## Reviews analysis :
The objective is to guess whether our new and personals films reviews are **positive or negative** .
For this, we will use our previously saved model.
What we're going to do:
- Preparing the data
- Retrieve our saved model
- Evaluate the result
%% Cell type:markdown id: tags:
## Step 1 - Init python stuff
%% Cell type:code id: tags:
``` python
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.datasets.imdb as imdb
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
import pandas as pd
import os,sys,h5py,json,re
from importlib import reload
sys.path.append('..')
import fidle.pwk as ooo
ooo.init()
```
%% Output
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-1-94e372328354> in <module>
7 import matplotlib.pyplot as plt
8 import matplotlib
----> 9 import seaborn as sns
10 import pandas as pd
11
ModuleNotFoundError: No module named 'seaborn'
%% Cell type:markdown id: tags:
## Step 2 : Preparing the data
### 2.1 - Our reviews :
%% Cell type:code id: tags:
``` python
reviews = [ "This film is particularly nice, a must see.",
"Some films are classics and cannot be ignored.",
"This movie is just abominable and doesn't deserve to be seen!"]
```
%% Cell type:markdown id: tags:
### 2.2 - Retrieve dictionaries
%% Cell type:code id: tags:
``` python
with open('./data/word_index.json', 'r') as fp:
word_index = json.load(fp)
index_word = {index:word for word,index in word_index.items()}
```
%% Cell type:markdown id: tags:
### 2.3 - Clean, index and padd
%% Cell type:code id: tags:
``` python
max_len = 256
vocab_size = 10000
nb_reviews = len(reviews)
x_data = []
# ---- For all reviews
for review in reviews:
# ---- First index must be <start>
index_review=[1]
# ---- For all words
for w in review.split(' '):
# ---- Clean it
w_clean = re.sub(r"[^a-zA-Z0-9]", "", w)
# ---- Not empty ?
if len(w_clean)>0:
# ---- Get the index
w_index = word_index.get(w,2)
if w_index>vocab_size : w_index=2
# ---- Add the index if < vocab_size
index_review.append(w_index)
# ---- Add the indexed review
x_data.append(index_review)
# ---- Padding
x_data = keras.preprocessing.sequence.pad_sequences(x_data, value = 0, padding = 'post', maxlen = max_len)
```
%% Cell type:markdown id: tags:
### 2.4 - Have a look
%% Cell type:code id: tags:
``` python
def translate(x):
return ' '.join( [index_word.get(i,'?') for i in x] )
for i in range(nb_reviews):
imax=np.where(x_data[i]==0)[0][0]+5
print(f'\nText review :', reviews[i])
print( f'x_train[{i:}] :', list(x_data[i][:imax]), '(...)')
print( 'Translation :', translate(x_data[i][:imax]), '(...)')
```
%% Cell type:markdown id: tags:
## Step 2 - Bring back the model
%% Cell type:code id: tags:
``` python
model = keras.models.load_model('./run/models/best_model.h5')
```
%% Cell type:markdown id: tags:
## Step 4 - Predict
%% Cell type:code id: tags:
``` python
y_pred = model.predict(x_data)
```
%% Cell type:markdown id: tags:
#### And the winner is :
%% Cell type:code id: tags:
``` python
for i in range(nb_reviews):
print(f'\n{reviews[i]:<70} =>',('NEGATIVE' if y_pred[i][0]<0.5 else 'POSITIVE'),f'({y_pred[i][0]:.2f})')
```
%% Cell type:code id: tags:
``` python
a=[1]+[i for i in range(3)]
a
```
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
Text Embedding - IMDB dataset
=============================
---
Introduction au Deep Learning (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
## Text classification using **Text embedding** :
The objective is to guess whether film reviews are **positive or negative** based on the analysis of the text.
Original dataset can be find **[there](http://ai.stanford.edu/~amaas/data/sentiment/)**
Note that [IMDb.com](https://imdb.com) offers several easy-to-use [datasets](https://www.imdb.com/interfaces/)
For simplicity's sake, we'll use the dataset directly [embedded in Keras](https://www.tensorflow.org/api_docs/python/tf/keras/datasets)
What we're going to do:
- Retrieve data
- Preparing the data
- Build a model
- Train the model
- Evaluate the result
%% Cell type:markdown id: tags:
## Step 1 - Init python stuff
%% Cell type:code id: tags:
``` python
import numpy as np
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.datasets.imdb as imdb
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
import os,sys,h5py,json
from importlib import reload
sys.path.append('..')
import fidle.pwk as ooo
ooo.init()
```
%% Cell type:markdown id: tags:
## Step 2 - Retrieve data
**From Keras :**
This IMDb dataset can bet get directly from [Keras datasets](https://www.tensorflow.org/api_docs/python/tf/keras/datasets)
Due to their nature, textual data can be somewhat complex.
### 2.1 - Data structure :
The dataset is composed of 2 parts: **reviews** and **opinions** (positive/negative), with a **dictionary**
- dataset = (reviews, opinions)
- reviews = \[ review_0, review_1, ...\]
- review_i = [ int1, int2, ...] where int_i is the index of the word in the dictionary.
- opinions = \[ int0, int1, ...\] where int_j == 0 if opinion is negative or 1 if opinion is positive.
- dictionary = \[ mot1:int1, mot2:int2, ... ]
%% Cell type:markdown id: tags:
### 2.2 - Get dataset
For simplicity, we will use a pre-formatted dataset.
See : https://www.tensorflow.org/api_docs/python/tf/keras/datasets/imdb/load_data
However, Keras offers some usefull tools for formatting textual data.
See : https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text
%% Cell type:code id: tags:
``` python
vocab_size = 10000
# ----- Retrieve x,y
#
(x_train, y_train), (x_test, y_test) = imdb.load_data( num_words = vocab_size,
skip_top = 0,
maxlen = None,
seed = 42,
start_char = 1,
oov_char = 2,
index_from = 3, )
```
%% Cell type:code id: tags:
``` python
print(" Max(x_train,x_test) : ", ooo.rmax([x_train,x_test]) )
print(" x_train : {} y_train : {}".format(x_train.shape, y_train.shape))
print(" x_test : {} y_test : {}".format(x_test.shape, y_test.shape))
print('\nReview example (x_train[12]) :\n\n',x_train[12])
```
%% Cell type:markdown id: tags:
### 2.3 - Have a look for humans (optional)
When we loaded the dataset, we asked for using \<start\> as 1, \<unknown word\> as 2
So, we shifted the dataset by 3 with the parameter index_from=3
%% Cell type:code id: tags:
``` python
# ---- Retrieve dictionary {word:index}, and encode it in ascii
word_index = imdb.get_word_index()
# ---- Shift the dictionary from +3
word_index = {w:(i+3) for w,i in word_index.items()}
# ---- Add <pad>, <start> and unknown tags
word_index.update( {'<pad>':0, '<start>':1, '<unknown>':2} )
# ---- Create a reverse dictionary : {index:word}
index_word = {index:word for word,index in word_index.items()}
# ---- Add a nice function to transpose :
#
def dataset2text(review):
return ' '.join([index_word.get(i, '?') for i in review])
```
%% Cell type:code id: tags:
``` python
print('\nDictionary size : ', len(word_index))
print('\nReview example (x_train[12]) :\n\n',x_train[12])
print('\nIn real words :\n\n', dataset2text(x_train[12]))
```
%% Cell type:markdown id: tags:
### 2.4 - Have a look for neurons
%% Cell type:code id: tags:
``` python
plt.figure(figsize=(12, 6))
ax=sns.distplot([len(i) for i in x_train],bins=60)
ax.set_title('Distribution of reviews by size')
plt.xlabel("Review's sizes")
plt.ylabel('Density')
ax.set_xlim(0, 1500)
plt.show()
```
%% Cell type:markdown id: tags:
## Step 3 - Preprocess the data
In order to be processed by an NN, all entries must have the same length.
We chose a review length of **review_len**
We will therefore complete them with a padding (of \<pad\>\)
%% Cell type:code id: tags:
``` python
review_len = 256
x_train = keras.preprocessing.sequence.pad_sequences(x_train,
value = 0,
padding = 'post',
maxlen = review_len)
x_test = keras.preprocessing.sequence.pad_sequences(x_test,
value = 0 ,
padding = 'post',
maxlen = review_len)
print('\nReview example (x_train[12]) :\n\n',x_train[12])
print('\nIn real words :\n\n', dataset2text(x_train[12]))
```
%% Cell type:markdown id: tags:
### Save dataset and dictionary (can be usefull)
%% Cell type:code id: tags:
``` python
os.makedirs('./data', mode=0o750, exist_ok=True)
with h5py.File('./data/dataset_imdb.h5', 'w') as f:
f.create_dataset("x_train", data=x_train)
f.create_dataset("y_train", data=y_train)
f.create_dataset("x_test", data=x_test)
f.create_dataset("y_test", data=y_test)
with open('./data/word_index.json', 'w') as fp:
json.dump(word_index, fp)
with open('./data/index_word.json', 'w') as fp:
json.dump(index_word, fp)
print('Saved.')
```
%% Cell type:markdown id: tags:
## Step 4 - Build the model
Few remarks :
1. We'll choose a dense vector size for the embedding output with **dense_vector_size**
2. **GlobalAveragePooling1D** do a pooling on the last dimension : (None, lx, ly) -> (None, ly)
In other words: we average the set of vectors/words of a sentence
3. L'embedding de Keras fonctionne de manière supervisée. Il s'agit d'une couche de *vocab_size* neurones vers *n_neurons* permettant de maintenir une table de vecteurs (les poids constituent les vecteurs). Cette couche ne calcule pas de sortie a la façon des couches normales, mais renvois la valeur des vecteurs. n mots => n vecteurs (ensuite empilés par le pooling)
Voir : https://stats.stackexchange.com/questions/324992/how-the-embedding-layer-is-trained-in-keras-embedding-layer
A SUIVRE : https://www.liip.ch/en/blog/sentiment-detection-with-keras-word-embeddings-and-lstm-deep-learning-networks
### 4.1 - Build
More documentation about :
- [Embedding](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding)
- [GlobalAveragePooling1D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalAveragePooling1D)
%% Cell type:code id: tags:
``` python
def get_model(dense_vector_size=128):
model = keras.Sequential()
model.add(keras.layers.Embedding(input_dim = vocab_size,
output_dim = dense_vector_size,
input_length = review_len))
model.add(keras.layers.LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(keras.layers.Dense(1, activation='sigmoid'))
model.compile(optimizer = 'adam',
loss = 'binary_crossentropy',
metrics = ['accuracy'])
return model
```
%% Cell type:markdown id: tags:
## Step 5 - Train the model
### 5.1 - Get it
%% Cell type:code id: tags:
``` python
model = get_model()
model.summary()
```
%% Cell type:markdown id: tags:
### 5.2 - Add callback
%% Cell type:code id: tags:
``` python
os.makedirs('./run/models', mode=0o750, exist_ok=True)
save_dir = "./run/models/best_model.h5"
savemodel_callback = tf.keras.callbacks.ModelCheckpoint(filepath=save_dir, verbose=0, save_best_only=True)
```
%% Cell type:markdown id: tags:
### 5.1 - Train it
GPU : batch_size=512 : 305s
%% Cell type:code id: tags:
``` python
%%time
n_epochs = 10
batch_size = 32
history = model.fit(x_train,
y_train,
epochs = n_epochs,
batch_size = batch_size,
validation_data = (x_test, y_test),
verbose = 1,
callbacks = [savemodel_callback])
```
%% Cell type:markdown id: tags:
## Step 6 - Evaluate
### 6.1 - Training history
%% Cell type:code id: tags:
``` python
ooo.plot_history(history)
```
%% Cell type:markdown id: tags:
### 6.2 - Reload and evaluate best model
%% Cell type:code id: tags:
``` python
model = keras.models.load_model('./run/models/best_model.h5')
# ---- Evaluate
reload(ooo)
score = model.evaluate(x_test, y_test, verbose=0)
print('x_test / loss : {:5.4f}'.format(score[0]))
print('x_test / accuracy : {:5.4f}'.format(score[1]))
values=[score[1], 1-score[1]]
ooo.plot_donut(values,["Accuracy","Errors"], title="#### Accuracy donut is :")
# ---- Confusion matrix
y_pred = model.predict_classes(x_test)
ooo.display_confusion_matrix(y_test,y_pred,labels=range(2),color='orange',font_size='20pt')
```
%% Cell type:code id: tags:
``` python
```
source diff could not be displayed: it is too large. Options to address this: view the blob.
![](fidle/img/00-Fidle-titre-01_m.png)
## A propos
Ce dépot contient l'ensemble des documents et liens de la **formation Fidle**.
Les objectifs de cette formations, co-organisée par la formation continue du CNRS et les réseaux SARI et DEVLOG, sont :
- Comprendre les **bases** des réseaux de neurones profonds (Deep Learning)
- Développer une **première expérience** à travers des exemples simples et représentatifs
- Comprendre les différents types de réseaux, leurs **architectures** et leurs **cas d'usages**
- Appréhender les technologies **Tensorflow/Kera**s et **Jupyter lab**, sur GPU
- Appréhender les **environnements de calcul académiques** tier-2 (méso) et/ou tier-1 (nationaux)
## Disposibles dans ce dépot :
Vous trouverez ici :
- le support des présentations
- l'ensemble des travaux pratiques, sous forme de notebooks Jupyter
- des fiches et informations pratiques :
- **[Configuration SSH](../-/wikis/howto-ssh)**
## Récupération de ce dépot et installation
To run this examples, you need an environment with the following packages :
- Python 3.6
- numpy
- Tensorflow 2.0
- scikit-image
- scikit-learn
- Matplotlib
- seaborn
- pyplot
You can install such a predefined environment :
```
conda env create -f environment.yml
```
To manage conda environment see [there](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#)
## Misc
...
%% Cell type:markdown id: tags:
Variational AutoEncoder (VAE) with MNIST
========================================
---
Formation Introduction au Deep Learning (FIDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
## Episode 1 - Train a model
- Defining a VAE model
- Build the model
- Train it
- Follow the learning process with Tensorboard
%% Cell type:markdown id: tags:
## Step 1 - Init python stuff
%% Cell type:code id: tags:
``` python
import numpy as np
import sys, importlib
import modules.vae
import modules.loader_MNIST
from modules.vae import VariationalAutoencoder
from modules.loader_MNIST import Loader_MNIST
VariationalAutoencoder.about()
```
%% Cell type:markdown id: tags:
## Step 2 - Get data
%% Cell type:code id: tags:
``` python
(x_train, y_train), (x_test, y_test) = Loader_MNIST.load()
```
%% Cell type:markdown id: tags:
## Step 3 - Get VAE model
%% Cell type:code id: tags:
``` python
tag = 'MNIST.001'
input_shape = (28,28,1)
z_dim = 2
verbose = 0
encoder= [ {'type':'Conv2D', 'filters':32, 'kernel_size':(3,3), 'strides':1, 'padding':'same', 'activation':'relu'},
{'type':'Conv2D', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Conv2D', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Conv2D', 'filters':64, 'kernel_size':(3,3), 'strides':1, 'padding':'same', 'activation':'relu'}
]
decoder= [ {'type':'Conv2DTranspose', 'filters':64, 'kernel_size':(3,3), 'strides':1, 'padding':'same', 'activation':'relu'},
{'type':'Conv2DTranspose', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Conv2DTranspose', 'filters':32, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Conv2DTranspose', 'filters':1, 'kernel_size':(3,3), 'strides':1, 'padding':'same', 'activation':'sigmoid'}
]
vae = modules.vae.VariationalAutoencoder(input_shape = input_shape,
encoder_layers = encoder,
decoder_layers = decoder,
z_dim = z_dim,
verbose = verbose,
run_tag = tag)
vae.save(model=None)
```
%% Cell type:markdown id: tags:
## Step 4 - Compile it
%% Cell type:code id: tags:
``` python
r_loss_factor = 1000
vae.compile( optimizer='adam', r_loss_factor=r_loss_factor)
```
%% Cell type:markdown id: tags:
## Step 5 - Train
%% Cell type:code id: tags:
``` python
batch_size = 100
epochs = 100
initial_epoch = 0
k_size = 1 # 1 mean using 100% of the dataset
```
%% Cell type:code id: tags:
``` python
vae.train(x_train,
x_test,
batch_size = batch_size,
epochs = epochs,
initial_epoch = initial_epoch,
k_size = k_size
)
```
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
```
source diff could not be displayed: it is too large. Options to address this: view the blob.
source diff could not be displayed: it is too large. Options to address this: view the blob.
%% Cell type:markdown id: tags:
Celeb Faces Dataset (CelebA)
=================================================
---
Introduction au Deep Learning (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
We'll do the same thing again but with a more interesting dataset: CelebFaces
About this dataset : http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
## Episode 1 : Preparation of data - Batch mode
- Save enhanced datasets in h5 file format
%% Cell type:markdown id: tags:
## Step 1 - Import and init
### 1.2 - Import
%% Cell type:code id: tags:
``` python
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from skimage import io, transform
import os,time,sys,json,glob
import csv
import math, random
from importlib import reload
sys.path.append('..')
import fidle.pwk as ooo
ooo.init()
```
%% Cell type:markdown id: tags:
### 1.2 - Directories and files :
%% Cell type:code id: tags:
``` python
place, dataset_dir = ooo.good_place( { 'GRICAD' : f'{os.getenv("SCRATCH_DIR","")}/PROJECTS/pr-fidle/datasets/celeba',
'IDRIS' : f'{os.getenv("WORK","")}/datasets/celeba' } )
dataset_csv = f'{dataset_dir}/list_attr_celeba.csv'
dataset_img = f'{dataset_dir}/img_align_celeba'
```
%% Cell type:markdown id: tags:
## Step 2 - Read filenames catalog
%% Cell type:code id: tags:
``` python
dataset_desc = pd.read_csv(dataset_csv, header=0)
dataset_desc = dataset_desc.reindex(np.random.permutation(dataset_desc.index))
```
%% Cell type:markdown id: tags:
## Step 3 - Save as clusters of n images
%% Cell type:markdown id: tags:
### 4.2 - Cooking function
%% Cell type:code id: tags:
``` python
def read_and_save( dataset_img, dataset_desc,
cluster_size=1000, cluster_dir='./dataset_cluster', cluster_name='images',
image_size=(128,128)):
def save_cluster(imgs,desc,cols,id):
file_img = f'{cluster_dir}/{cluster_name}-{id:03d}.npy'
file_desc = f'{cluster_dir}/{cluster_name}-{id:03d}.csv'
np.save(file_img, np.array(imgs))
df=pd.DataFrame(data=desc,columns=cols)
df.to_csv(file_desc, index=False)
return [],[],id+1
start_time = time.time()
cols = list(dataset_desc.columns)
# ---- Check if cluster files exist
#
if os.path.isfile(f'{cluster_dir}/images-000.npy'):
print('\n*** Oops. There are already clusters in the target folder!\n')
return 0,0
# ---- Create cluster_dir
#
os.makedirs(cluster_dir, mode=0o750, exist_ok=True)
# ---- Read and save clusters
#
imgs, desc, cluster_id = [],[],0
#
for i,row in dataset_desc.iterrows():
#
filename = f'{dataset_img}/{row.image_id}'
#
# ---- Read image, resize (and normalize)
#
img = io.imread(filename)
img = transform.resize(img, image_size)
#
# ---- Add image and description
#
imgs.append( img )
desc.append( row.values )
#
# ---- Progress bar
#
ooo.update_progress(f'Cluster {cluster_id:03d} :',len(imgs),cluster_size)
#
# ---- Save cluster if full
#
if len(imgs)==cluster_size:
imgs,desc,cluster_id=save_cluster(imgs,desc,cols, cluster_id)
# ---- Save uncomplete cluster
if len(imgs)>0 : imgs,desc,cluster_id=save_cluster(imgs,desc,cols,cluster_id)
duration=time.time()-start_time
return cluster_id,duration
```
%% Cell type:markdown id: tags:
### 4.3 - Cluster building
%% Cell type:code id: tags:
``` python
# ---- Cluster size
cluster_size_train = 10000
cluster_size_test = 10000
image_size = (192,160)
# ---- Clusters location
train_dir = f'{dataset_dir}/clusters-M.train'
test_dir = f'{dataset_dir}/clusters-M.test'
# ---- x_train, x_test
#
n1,d1 = read_and_save(dataset_img, dataset_desc[:200000],
cluster_size = cluster_size_train,
cluster_dir = train_dir,
image_size = image_size )
n2,d2 = read_and_save(dataset_img, dataset_desc[200000:],
cluster_size = cluster_size_test,
cluster_dir = test_dir,
image_size = image_size )
print(f'\n\nDuration : {d1+d2:.2f} s or {ooo.hdelay(d1+d2)}')
print(f'Train clusters : {train_dir}')
print(f'Test clusters : {test_dir}')
```
%% Cell type:markdown id: tags:
----
That's all folks !
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
Celeb Faces Dataset (CelebA)
=================================================
---
Introduction au Deep Learning (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
We'll do the same thing again but with a more interesting dataset: CelebFaces
About this dataset : http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
## Episode 1 : Preparation of data - Batch mode
- Save enhanced datasets in h5 file format
%% Cell type:markdown id: tags:
## Step 1 - Import and init
### 1.2 - Import
%% Cell type:code id: tags:
``` python
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from skimage import io, transform
import os,time,sys,json,glob
import csv
import math, random
from importlib import reload
sys.path.append('..')
import fidle.pwk as ooo
ooo.init()
```
%% Output
FIDLE 2020 - Practical Work Module
Version : 0.2.8
Run time : Thursday 13 February 2020, 23:50:25
TensorFlow version : 2.0.0
Keras version : 2.2.4-tf
%% Cell type:markdown id: tags:
### 1.2 - Directories and files :
%% Cell type:code id: tags:
``` python
place, dataset_dir = ooo.good_place( { 'GRICAD' : f'{os.getenv("SCRATCH_DIR","")}/PROJECTS/pr-fidle/datasets/celeba',
'IDRIS' : f'{os.getenv("WORK","")}/datasets/celeba' } )
dataset_csv = f'{dataset_dir}/list_attr_celeba.csv'
dataset_img = f'{dataset_dir}/img_align_celeba'
```
%% Output
Well, we should be at IDRIS !
We are going to use: /gpfswork/rech/mlh/uja62cb/datasets/celeba
%% Cell type:markdown id: tags:
## Step 2 - Read filenames catalog
%% Cell type:code id: tags:
``` python
dataset_desc = pd.read_csv(dataset_csv, header=0)
dataset_desc = dataset_desc.reindex(np.random.permutation(dataset_desc.index))
```
%% Cell type:markdown id: tags:
## Step 3 - Save as clusters of n images
%% Cell type:markdown id: tags:
### 4.2 - Cooking function
%% Cell type:code id: tags:
``` python
def read_and_save( dataset_img, dataset_desc,
cluster_size=1000, cluster_dir='./dataset_cluster', cluster_name='images',
image_size=(128,128)):
def save_cluster(imgs,desc,cols,id):
file_img = f'{cluster_dir}/{cluster_name}-{id:03d}.npy'
file_desc = f'{cluster_dir}/{cluster_name}-{id:03d}.csv'
np.save(file_img, np.array(imgs))
df=pd.DataFrame(data=desc,columns=cols)
df.to_csv(file_desc, index=False)
return [],[],id+1
start_time = time.time()
cols = list(dataset_desc.columns)
# ---- Check if cluster files exist
#
if os.path.isfile(f'{cluster_dir}/images-000.npy'):
print('\n*** Oops. There are already clusters in the target folder!\n')
return 0,0
# ---- Create cluster_dir
#
os.makedirs(cluster_dir, mode=0o750, exist_ok=True)
# ---- Read and save clusters
#
imgs, desc, cluster_id = [],[],0
#
for i,row in dataset_desc.iterrows():
#
filename = f'{dataset_img}/{row.image_id}'
#
# ---- Read image, resize (and normalize)
#
img = io.imread(filename)
img = transform.resize(img, image_size)
#
# ---- Add image and description
#
imgs.append( img )
desc.append( row.values )
#
# ---- Progress bar
#
ooo.update_progress(f'Cluster {cluster_id:03d} :',len(imgs),cluster_size)
#
# ---- Save cluster if full
#
if len(imgs)==cluster_size:
imgs,desc,cluster_id=save_cluster(imgs,desc,cols, cluster_id)
# ---- Save uncomplete cluster
if len(imgs)>0 : imgs,desc,cluster_id=save_cluster(imgs,desc,cols,cluster_id)
duration=time.time()-start_time
return cluster_id,duration
```
%% Cell type:markdown id: tags:
### 4.3 - Cluster building
%% Cell type:code id: tags:
``` python
# ---- Cluster size
cluster_size_train = 10000
cluster_size_test = 10000
image_size = (192,160)
# ---- Clusters location
train_dir = f'{dataset_dir}/clusters-M.train'
test_dir = f'{dataset_dir}/clusters-M.test'
# ---- x_train, x_test
#
n1,d1 = read_and_save(dataset_img, dataset_desc[:200000],
cluster_size = cluster_size_train,
cluster_dir = train_dir,
image_size = image_size )
n2,d2 = read_and_save(dataset_img, dataset_desc[200000:],
cluster_size = cluster_size_test,
cluster_dir = test_dir,
image_size = image_size )
print(f'\n\nDuration : {d1+d2:.2f} s or {ooo.hdelay(d1+d2)}')
print(f'Train clusters : {train_dir}')
print(f'Test clusters : {test_dir}')
```
%% Output
*** Oops. There are already clusters in the target folder!
*** Oops. There are already clusters in the target folder!
Duration : 0.00 s or 0:00:00
Train clusters : /gpfswork/rech/mlh/uja62cb/datasets/celeba/clusters-M.train
Test clusters : /gpfswork/rech/mlh/uja62cb/datasets/celeba/clusters-M.test
%% Cell type:markdown id: tags:
----
That's all folks !
%% Cell type:code id: tags:
``` python
```
source diff could not be displayed: it is too large. Options to address this: view the blob.
%% Cell type:markdown id: tags:
Variational AutoEncoder (VAE) with CelebA
=========================================
---
Formation Introduction au Deep Learning (FIDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
## Episode 1 - Train a model
- Defining a VAE model
- Build the model
- Train it
- Follow the learning process with Tensorboard
%% Cell type:markdown id: tags:
## Step 1 - Setup environment
### 1.1 - Python stuff
%% Cell type:code id: tags:
``` python
import tensorflow as tf
import numpy as np
import os,sys
from importlib import reload
import modules.vae
import modules.data_generator
reload(modules.data_generator)
reload(modules.vae)
from modules.vae import VariationalAutoencoder
from modules.data_generator import DataGenerator
sys.path.append('..')
import fidle.pwk as ooo
reload(ooo)
ooo.init()
VariationalAutoencoder.about()
DataGenerator.about()
```
%% Output
FIDLE 2020 - Practical Work Module
Version : 0.2.8
Run time : Thursday 13 February 2020, 21:38:51
TensorFlow version : 2.0.0
Keras version : 2.2.4-tf
FIDLE 2020 - Variational AutoEncoder (VAE)
TensorFlow version : 2.0.0
VAE version : 1.27
FIDLE 2020 - DataGenerator
Version : 0.4.1
%% Cell type:markdown id: tags:
### 1.2 - The good place
%% Cell type:code id: tags:
``` python
place, dataset_dir = ooo.good_place( { 'GRICAD' : f'{os.getenv("SCRATCH_DIR","")}/PROJECTS/pr-fidle/datasets/celeba',
'IDRIS' : f'{os.getenv("WORK","")}/datasets/celeba' } )
# ---- train/test datasets
train_dir = f'{dataset_dir}/clusters.train'
test_dir = f'{dataset_dir}/clusters.test'
```
%% Output
Well, we should be at IDRIS !
We are going to use: /gpfswork/rech/mlh/uja62cb/datasets/celeba
%% Cell type:markdown id: tags:
## Step 2 - DataGenerator and validation data
Ok, everything's perfect, now let's instantiate our generator for the entire dataset.
%% Cell type:code id: tags:
``` python
data_gen = DataGenerator(train_dir, 32, k_size=1)
x_test = np.load(f'{test_dir}/images-000.npy')
print(f'Data generator : {len(data_gen)} batchs of {data_gen.batch_size} images, or {data_gen.dataset_size} images')
print(f'x_test : {len(x_test)} images')
```
%% Output
Data generator : 6250 batchs of 32 images, or 200000 images
x_test : 2599 images
%% Cell type:markdown id: tags:
## Step 3 - Get VAE model
%% Cell type:code id: tags:
``` python
tag = 'CelebA.test'
input_shape = (128, 128, 3)
z_dim = 200
verbose = 1
encoder= [ {'type':'Conv2D', 'filters':32, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2D', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2D', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2D', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
]
decoder= [ {'type':'Conv2DTranspose', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2DTranspose', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2DTranspose', 'filters':32, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2DTranspose', 'filters':3, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'sigmoid'}
]
vae = modules.vae.VariationalAutoencoder(input_shape = input_shape,
encoder_layers = encoder,
decoder_layers = decoder,
z_dim = z_dim,
verbose = verbose,
run_tag = tag)
vae.save(model=None)
```
%% Output
Model initialized.
Outputs will be in : ./run/CelebA.test
---------- Encoder --------------------------------------------------
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
encoder_input (InputLayer) [(None, 128, 128, 3) 0
__________________________________________________________________________________________________
conv2d (Conv2D) (None, 64, 64, 32) 896 encoder_input[0][0]
__________________________________________________________________________________________________
dropout (Dropout) (None, 64, 64, 32) 0 conv2d[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 32, 32, 64) 18496 dropout[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout) (None, 32, 32, 64) 0 conv2d_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 16, 16, 64) 36928 dropout_1[0][0]
__________________________________________________________________________________________________
dropout_2 (Dropout) (None, 16, 16, 64) 0 conv2d_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 8, 8, 64) 36928 dropout_2[0][0]
__________________________________________________________________________________________________
dropout_3 (Dropout) (None, 8, 8, 64) 0 conv2d_3[0][0]
__________________________________________________________________________________________________
flatten (Flatten) (None, 4096) 0 dropout_3[0][0]
__________________________________________________________________________________________________
mu (Dense) (None, 200) 819400 flatten[0][0]
__________________________________________________________________________________________________
log_var (Dense) (None, 200) 819400 flatten[0][0]
__________________________________________________________________________________________________
encoder_output (Lambda) (None, 200) 0 mu[0][0]
log_var[0][0]
==================================================================================================
Total params: 1,732,048
Trainable params: 1,732,048
Non-trainable params: 0
__________________________________________________________________________________________________
---------- Encoder --------------------------------------------------
Model: "model_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
decoder_input (InputLayer) [(None, 200)] 0
_________________________________________________________________
dense (Dense) (None, 4096) 823296
_________________________________________________________________
reshape (Reshape) (None, 8, 8, 64) 0
_________________________________________________________________
conv2d_transpose (Conv2DTran (None, 16, 16, 64) 36928
_________________________________________________________________
dropout_4 (Dropout) (None, 16, 16, 64) 0
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 32, 32, 64) 36928
_________________________________________________________________
dropout_5 (Dropout) (None, 32, 32, 64) 0
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 64, 64, 32) 18464
_________________________________________________________________
dropout_6 (Dropout) (None, 64, 64, 32) 0
_________________________________________________________________
conv2d_transpose_3 (Conv2DTr (None, 128, 128, 3) 867
=================================================================
Total params: 916,483
Trainable params: 916,483
Non-trainable params: 0
_________________________________________________________________
Failed to import pydot. You must install pydot and graphviz for `pydotprint` to work.
Failed to import pydot. You must install pydot and graphviz for `pydotprint` to work.
Failed to import pydot. You must install pydot and graphviz for `pydotprint` to work.
Config saved in : ./run/CelebA.test/models/vae_config.json
%% Cell type:markdown id: tags:
## Step 4 - Compile it
%% Cell type:code id: tags:
``` python
optimizer = tf.keras.optimizers.Adam(1e-4)
# optimizer = 'adam'
r_loss_factor = 10000
vae.compile(optimizer, r_loss_factor)
```
%% Output
Compiled.
%% Cell type:markdown id: tags:
## Step 5 - Train
For 10 epochs, adam optimizer :
- Run time at IDRIS : 1299.77 sec. - 0:21:39
- Run time at GRICAD : 2092.77 sec. - 0:34:52
%% Cell type:code id: tags:
``` python
epochs = 10
initial_epoch = 0
```
%% Cell type:code id: tags:
``` python
vae.train(data_generator = data_gen,
x_test = x_test,
epochs = epochs,
initial_epoch = initial_epoch
)
```
%% Output
Epoch 1/10
6250/6250 [==============================] - 175s 28ms/step - loss: 349.6490 - vae_r_loss: 301.2042 - vae_kl_loss: 48.4450 - val_loss: 236.8924 - val_vae_r_loss: 189.8669 - val_vae_kl_loss: 47.1441
Epoch 2/10
6250/6250 [==============================] - 128s 20ms/step - loss: 241.0380 - vae_r_loss: 187.2187 - vae_kl_loss: 53.8191 - val_loss: 215.4507 - val_vae_r_loss: 162.7656 - val_vae_kl_loss: 52.8317
Epoch 3/10
3141/6250 [==============>...............] - ETA: 1:02 - loss: 230.1868 - vae_r_loss: 175.1927 - vae_kl_loss: 54.9941
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
Variational AutoEncoder (VAE) with CelebA
=========================================
---
Formation Introduction au Deep Learning (FIDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
## Episode 1 - Train a model
- Defining a VAE model
- Build the model
- Train it
- Follow the learning process with Tensorboard
%% Cell type:markdown id: tags:
## Step 1 - Setup environment
### 1.1 - Python stuff
%% Cell type:code id: tags:
``` python
import tensorflow as tf
import numpy as np
import os,sys
from importlib import reload
import modules.vae
import modules.data_generator
reload(modules.data_generator)
reload(modules.vae)
from modules.vae import VariationalAutoencoder
from modules.data_generator import DataGenerator
sys.path.append('..')
import fidle.pwk as ooo
reload(ooo)
ooo.init()
VariationalAutoencoder.about()
DataGenerator.about()
```
%% Cell type:markdown id: tags:
### 1.2 - The good place
%% Cell type:code id: tags:
``` python
place, dataset_dir = ooo.good_place( { 'GRICAD' : f'{os.getenv("SCRATCH_DIR","")}/PROJECTS/pr-fidle/datasets/celeba',
'IDRIS' : f'{os.getenv("WORK","")}/datasets/celeba' } )
# ---- train/test datasets
train_dir = f'{dataset_dir}/clusters.train'
test_dir = f'{dataset_dir}/clusters.test'
```
%% Cell type:markdown id: tags:
## Step 2 - DataGenerator and validation data
Ok, everything's perfect, now let's instantiate our generator for the entire dataset.
%% Cell type:code id: tags:
``` python
data_gen = DataGenerator(train_dir, 32, k_size=1)
x_test = np.load(f'{test_dir}/images-000.npy')
print(f'Data generator : {len(data_gen)} batchs of {data_gen.batch_size} images, or {data_gen.dataset_size} images')
print(f'x_test : {len(x_test)} images')
```
%% Cell type:markdown id: tags:
## Step 3 - Get VAE model
%% Cell type:code id: tags:
``` python
tag = 'CelebA.004'
input_shape = (128, 128, 3)
z_dim = 200
verbose = 0
encoder= [ {'type':'Conv2D', 'filters':32, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2D', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2D', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2D', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
]
decoder= [ {'type':'Conv2DTranspose', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2DTranspose', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2DTranspose', 'filters':32, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2DTranspose', 'filters':3, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'sigmoid'}
]
vae = modules.vae.VariationalAutoencoder(input_shape = input_shape,
encoder_layers = encoder,
decoder_layers = decoder,
z_dim = z_dim,
verbose = verbose,
run_tag = tag)
vae.save(model=None)
```
%% Cell type:markdown id: tags:
## Step 4 - Compile it
%% Cell type:code id: tags:
``` python
optimizer = tf.keras.optimizers.Adam(1e-4)
# optimizer = 'adam'
r_loss_factor = 10000
vae.compile(optimizer, r_loss_factor)
```
%% Cell type:markdown id: tags:
## Step 5 - Train
For 10 epochs, adam optimizer :
- Run time at IDRIS : 1299.77 sec. - 0:21:39
- Run time at GRICAD : 2092.77 sec. - 0:34:52
%% Cell type:code id: tags:
``` python
epochs = 10
initial_epoch = 0
```
%% Cell type:code id: tags:
``` python
vae.train(data_generator = data_gen,
x_test = x_test,
epochs = epochs,
initial_epoch = initial_epoch
)
```
%% Cell type:code id: tags:
``` python
```
%% Cell type:code id: tags:
``` python
```
%% Cell type:markdown id: tags:
Variational AutoEncoder (VAE) with CelebA
=========================================
---
Formation Introduction au Deep Learning (FIDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
## Episode 1 - Train a model
- Defining a VAE model
- Build the model
- Train it
- Follow the learning process with Tensorboard
%% Cell type:markdown id: tags:
## Step 1 - Setup environment
### 1.1 - Python stuff
%% Cell type:code id: tags:
``` python
import tensorflow as tf
import numpy as np
import os,sys
from importlib import reload
import modules.vae
import modules.data_generator
reload(modules.data_generator)
reload(modules.vae)
from modules.vae import VariationalAutoencoder
from modules.data_generator import DataGenerator
sys.path.append('..')
import fidle.pwk as ooo
reload(ooo)
ooo.init()
VariationalAutoencoder.about()
DataGenerator.about()
```
%% Cell type:markdown id: tags:
### 1.2 - The good place
%% Cell type:code id: tags:
``` python
place, dataset_dir = ooo.good_place( { 'GRICAD' : f'{os.getenv("SCRATCH_DIR","")}/PROJECTS/pr-fidle/datasets/celeba',
'IDRIS' : f'{os.getenv("WORK","")}/datasets/celeba' } )
# ---- train/test datasets
train_dir = f'{dataset_dir}/clusters-M.train'
test_dir = f'{dataset_dir}/clusters-M.test'
```
%% Cell type:markdown id: tags:
## Step 2 - DataGenerator and validation data
Ok, everything's perfect, now let's instantiate our generator for the entire dataset.
%% Cell type:code id: tags:
``` python
data_gen = DataGenerator(train_dir, 32, k_size=1)
x_test = np.load(f'{test_dir}/images-000.npy')
print(f'Data generator : {len(data_gen)} batchs of {data_gen.batch_size} images, or {data_gen.dataset_size} images')
print(f'x_test : {len(x_test)} images')
```
%% Cell type:markdown id: tags:
## Step 3 - Get VAE model
%% Cell type:code id: tags:
``` python
tag = f'CelebA.052-M.{os.getenv("SLURM_JOB_ID","unknown")}'
input_shape = (192, 160, 3)
z_dim = 200
verbose = 0
encoder= [ {'type':'Conv2D', 'filters':32, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2D', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2D', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2D', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
]
decoder= [ {'type':'Conv2DTranspose', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2DTranspose', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2DTranspose', 'filters':32, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2DTranspose', 'filters':3, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'sigmoid'}
]
vae = modules.vae.VariationalAutoencoder(input_shape = input_shape,
encoder_layers = encoder,
decoder_layers = decoder,
z_dim = z_dim,
verbose = verbose,
run_tag = tag)
vae.save(model=None)
```
%% Cell type:markdown id: tags:
## Step 4 - Compile it
%% Cell type:code id: tags:
``` python
optimizer = tf.keras.optimizers.Adam(1e-4)
r_loss_factor = 10000
vae.compile(optimizer, r_loss_factor)
```
%% Cell type:markdown id: tags:
## Step 5 - Train
For 10 epochs, adam optimizer :
- Run time at IDRIS : 1299.77 sec. - 0:21:39
- Run time at GRICAD : 2092.77 sec. - 0:34:52
%% Cell type:code id: tags:
``` python
epochs = 20
initial_epoch = 0
```
%% Cell type:code id: tags:
``` python
vae.train(data_generator = data_gen,
x_test = x_test,
epochs = epochs,
initial_epoch = initial_epoch
)
```
%% Cell type:markdown id: tags:
----
That's all folks !
%% Cell type:markdown id: tags:
Variational AutoEncoder (VAE) with CelebA
=========================================
---
Formation Introduction au Deep Learning (FIDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020
## Episode 1 - Train a model
- Defining a VAE model
- Build the model
- Train it
- Follow the learning process with Tensorboard
%% Cell type:markdown id: tags:
## Step 1 - Setup environment
### 1.1 - Python stuff
%% Cell type:code id: tags:
``` python
import tensorflow as tf
import numpy as np
import os,sys
from importlib import reload
import modules.vae
import modules.data_generator
reload(modules.data_generator)
reload(modules.vae)
from modules.vae import VariationalAutoencoder
from modules.data_generator import DataGenerator
sys.path.append('..')
import fidle.pwk as ooo
reload(ooo)
ooo.init()
VariationalAutoencoder.about()
DataGenerator.about()
```
%% Output
FIDLE 2020 - Practical Work Module
Version : 0.2.8
Run time : Friday 14 February 2020, 00:07:28
TensorFlow version : 2.0.0
Keras version : 2.2.4-tf
FIDLE 2020 - Variational AutoEncoder (VAE)
TensorFlow version : 2.0.0
VAE version : 1.28
FIDLE 2020 - DataGenerator
Version : 0.4.1
%% Cell type:markdown id: tags:
### 1.2 - The good place
%% Cell type:code id: tags:
``` python
place, dataset_dir = ooo.good_place( { 'GRICAD' : f'{os.getenv("SCRATCH_DIR","")}/PROJECTS/pr-fidle/datasets/celeba',
'IDRIS' : f'{os.getenv("WORK","")}/datasets/celeba' } )
# ---- train/test datasets
train_dir = f'{dataset_dir}/clusters-M.train'
test_dir = f'{dataset_dir}/clusters-M.test'
```
%% Output
Well, we should be at IDRIS !
We are going to use: /gpfswork/rech/mlh/uja62cb/datasets/celeba
%% Cell type:markdown id: tags:
## Step 2 - DataGenerator and validation data
Ok, everything's perfect, now let's instantiate our generator for the entire dataset.
%% Cell type:code id: tags:
``` python
data_gen = DataGenerator(train_dir, 32, k_size=1)
x_test = np.load(f'{test_dir}/images-000.npy')
print(f'Data generator : {len(data_gen)} batchs of {data_gen.batch_size} images, or {data_gen.dataset_size} images')
print(f'x_test : {len(x_test)} images')
```
%% Output
Data generator : 6250 batchs of 32 images, or 200000 images
x_test : 2599 images
%% Cell type:markdown id: tags:
## Step 3 - Get VAE model
%% Cell type:code id: tags:
``` python
tag = f'CelebA.052-M.{os.getenv("SLURM_JOB_ID","unknown")}'
input_shape = (192, 160, 3)
z_dim = 200
verbose = 0
encoder= [ {'type':'Conv2D', 'filters':32, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2D', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2D', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2D', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
]
decoder= [ {'type':'Conv2DTranspose', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2DTranspose', 'filters':64, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2DTranspose', 'filters':32, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'relu'},
{'type':'Dropout', 'rate':0.25},
{'type':'Conv2DTranspose', 'filters':3, 'kernel_size':(3,3), 'strides':2, 'padding':'same', 'activation':'sigmoid'}
]
vae = modules.vae.VariationalAutoencoder(input_shape = input_shape,
encoder_layers = encoder,
decoder_layers = decoder,
z_dim = z_dim,
verbose = verbose,
run_tag = tag)
vae.save(model=None)
```
%% Output
Model initialized.
Outputs will be in : ./run/CelebA.052-M.810156
Config saved in : ./run/CelebA.052-M.810156/models/vae_config.json
%% Cell type:markdown id: tags:
## Step 4 - Compile it
%% Cell type:code id: tags:
``` python
optimizer = tf.keras.optimizers.Adam(1e-4)
r_loss_factor = 10000
vae.compile(optimizer, r_loss_factor)
```
%% Output
Compiled.
%% Cell type:markdown id: tags:
## Step 5 - Train
For 10 epochs, adam optimizer :
- Run time at IDRIS : 1299.77 sec. - 0:21:39
- Run time at GRICAD : 2092.77 sec. - 0:34:52
%% Cell type:code id: tags:
``` python
epochs = 20
initial_epoch = 0
```
%% Cell type:code id: tags:
``` python
vae.train(data_generator = data_gen,
x_test = x_test,
epochs = epochs,
initial_epoch = initial_epoch
)
```
%% Output
Epoch 1/20
6250/6250 [==============================] - 342s 55ms/step - loss: 332.4792 - vae_r_loss: 282.9655 - vae_kl_loss: 49.5134 - val_loss: 235.4288 - val_vae_r_loss: 187.4334 - val_vae_kl_loss: 48.1192
Epoch 2/20
6250/6250 [==============================] - 337s 54ms/step - loss: 224.0962 - vae_r_loss: 166.4602 - vae_kl_loss: 57.6360 - val_loss: 210.9632 - val_vae_r_loss: 155.0925 - val_vae_kl_loss: 56.0114
Epoch 4/20
6250/6250 [==============================] - 333s 53ms/step - loss: 214.5463 - vae_r_loss: 155.7666 - vae_kl_loss: 58.7794 - val_loss: 203.8241 - val_vae_r_loss: 147.3248 - val_vae_kl_loss: 56.5778
Epoch 7/20
6250/6250 [==============================] - 327s 52ms/step - loss: 211.1459 - vae_r_loss: 152.4026 - vae_kl_loss: 58.7437 - val_loss: 201.1862 - val_vae_r_loss: 145.6906 - val_vae_kl_loss: 55.6326
Epoch 10/20
6250/6250 [==============================] - 335s 54ms/step - loss: 209.7628 - vae_r_loss: 151.0874 - vae_kl_loss: 58.6756 - val_loss: 202.3954 - val_vae_r_loss: 147.0956 - val_vae_kl_loss: 55.4047
Epoch 12/20
6250/6250 [==============================] - 333s 53ms/step - loss: 207.9830 - vae_r_loss: 149.3870 - vae_kl_loss: 58.5959 - val_loss: 198.5626 - val_vae_r_loss: 142.5848 - val_vae_kl_loss: 56.0871
Epoch 16/20
6250/6250 [==============================] - 330s 53ms/step - loss: 206.6382 - vae_r_loss: 148.0522 - vae_kl_loss: 58.5863 - val_loss: 197.5800 - val_vae_r_loss: 142.6799 - val_vae_kl_loss: 54.9832
Train duration : 6638.61 sec. - 1:50:38
%% Cell type:markdown id: tags:
----
That's all folks !
source diff could not be displayed: it is too large. Options to address this: view the blob.
#!/bin/bash
#OAR -n VAE with CelebA
#OAR -t gpu
#OAR -l /nodes=1/gpudevice=1,walltime=01:00:00
#OAR --stdout _batch/VAE_CelebA_%jobid%.out
#OAR --stderr _batch/VAE_CelebA_%jobid%.err
#OAR --project fidle
#---- For cpu
# use :
# OAR -l /nodes=1/core=32,walltime=01:00:00
# and add a 2>/dev/null to ipython xxx
# -----------------------------------------------
# _ _ _
# | |__ __ _| |_ ___| |__
# | '_ \ / _` | __/ __| '_ \
# | |_) | (_| | || (__| | | |
# |_.__/ \__,_|\__\___|_| |_|
# VAE CelebA at GRICAD
# -----------------------------------------------
#
CONDA_ENV=deeplearning2
RUN_DIR=~/fidle/VAE
RUN_IPYNB=05.1-Batch-01.ipynb
# ---- Cuda Conda initialization
#
echo '------------------------------------------------------------'
echo "Start : $0"
echo '------------------------------------------------------------'
#
source /applis/environments/cuda_env.sh dahu 10.0
source /applis/environments/conda.sh
#
conda activate "$CONDA_ENV"
# ---- Run it...
#
cd $RUN_DIR
jupyter nbconvert --to notebook --execute "$RUN_IPYNB"
#!/bin/bash
#SBATCH --job-name="VAE_bizness" # nom du job
#SBATCH --ntasks=1 # nombre de tâche (un unique processus ici)
#SBATCH --gres=gpu:1 # nombre de GPU à réserver (un unique GPU ici)
#SBATCH --cpus-per-task=10 # nombre de coeurs à réserver (un quart du noeud)
#SBATCH --hint=nomultithread # on réserve des coeurs physiques et non logiques
#SBATCH --time=02:00:00 # temps exécution maximum demande (HH:MM:SS)
#SBATCH --output="_batch/VAE_%j.out" # nom du fichier de sortie
#SBATCH --error="_batch/VAE_%j.err" # nom du fichier d'erreur (ici commun avec la sortie)
#SBATCH --mail-user=Jean-Luc.Parouty@grenoble-inp.fr
#SBATCH --mail-type=ALL
# -----------------------------------------------
# _ _ _
# | |__ __ _| |_ ___| |__
# | '_ \ / _` | __/ __| '_ \
# | |_) | (_| | || (__| | | |
# |_.__/ \__,_|\__\___|_| |_|
# VAE CelebA at IDRIS
# -----------------------------------------------
#
MODULE_ENV="tensorflow-gpu/py3/2.0.0"
RUN_DIR="$WORK/fidle/VAE"
RUN_IPYNB="05.2-Variant.ipynb"
# ---- Welcome...
echo '------------------------------------------------------------'
echo "Start : $0"
echo '------------------------------------------------------------'
echo "Job id : $SLURM_JOB_ID"
echo "Job name : $SLURM_JOB_NAME"
echo "Job node list : $SLURM_JOB_NODELIST"
echo '------------------------------------------------------------'
echo "Notebook : $RUN_IPYNB"
echo "Run in : $RUN_DIR"
echo "With env. : $MODULE_ENV"
echo '------------------------------------------------------------'
# ---- Module
module load $MODULE_ENV
# ---- Run it...
cd $RUN_DIR
jupyter nbconvert --ExecutePreprocessor.timeout=-1 --to notebook --execute "$RUN_IPYNB"
from tensorflow.keras.callbacks import Callback
import numpy as np
import matplotlib.pyplot as plt
class ImagesCallback(Callback):
def __init__(self, filename= 'image-{epoch:03d}-{i:02d}.jpg', z_dim=0, decoder=None, nb_images=5):
self.filename = filename
self.z_dim = z_dim
self.decoder = decoder
self.nb_images = nb_images
def on_epoch_end(self, epoch, logs={}):
# ---- Get random latent points
z_new = np.random.normal(size = (self.nb_images,self.z_dim))
# ---- Predict an image
images = self.decoder.predict(np.array(z_new))
# ---- Save images
for i,image in enumerate(images):
# ---- Squeeze it if monochrome : (lx,ly,1) -> (lx,ly)
image = image.squeeze()
# ---- Save it
filename = self.filename.format(epoch=epoch,i=i)
if len(image.shape) == 2:
plt.imsave(filename, image, cmap='gray_r')
else:
plt.imsave(filename, image)
# ------------------------------------------------------------------
# _____ _ _ _
# | ___(_) __| | | ___
# | |_ | |/ _` | |/ _ \
# | _| | | (_| | | __/ DataGenerator
# |_| |_|\__,_|_|\___| for clustered CelebA sataset
# ------------------------------------------------------------------
# Formation Introduction au Deep Learning (FIDLE)
# CNRS/SARI/DEVLOG 2020 - S. Arias, E. Maldonado, JL. Parouty
# ------------------------------------------------------------------
# Initial version by JL Parouty, feb 2020
import numpy as np
import pandas as pd
import math
import os,glob
from tensorflow.keras.utils import Sequence
class DataGenerator(Sequence):
version = '0.4.1'
def __init__(self, clusters_dir='./data', batch_size=32, debug=False, k_size=1):
'''
Instanciation of the data generator
args:
cluster_dir : Directory of the clusters files
batch_size : Batch size (32)
debug : debug mode (False)
'''
if debug : self.about()
#
# ---- Get the list of clusters
#
clusters_name = [ os.path.splitext(f)[0] for f in glob.glob( f'{clusters_dir}/*.npy') ]
clusters_size = len(clusters_name)
#
# ---- Read each cluster description
# because we need the full dataset size
#
dataset_size = 0
for c in clusters_name:
df = pd.read_csv(c+'.csv', header=0)
dataset_size+=len(df.index)
#
# ---- If we only want to use a part of the dataset...
#
dataset_size = int(dataset_size * k_size)
#
if debug:
print(f'\nClusters nb : {len(clusters_name)} files')
print(f'Dataset size : {dataset_size}')
print(f'Batch size : {batch_size}')
#
# ---- Remember all of that
#
self.clusters_dir = clusters_dir
self.batch_size = batch_size
self.clusters_name = clusters_name
self.clusters_size = clusters_size
self.dataset_size = dataset_size
self.debug = debug
#
# ---- Read a first cluster
#
self.cluster_i = clusters_size
self.read_next_cluster()
def __len__(self):
return math.floor(self.dataset_size / self.batch_size)
def __getitem__(self, idx):
#
# ---- Get the next item index
#
i=self.data_i
#
# ---- Get a batch
#
batch = self.data[i:i+self.batch_size]
#
# ---- Cluster is large enough
#
if len(batch) == self.batch_size:
self.data_i += self.batch_size
if self.debug: print(f'({len(batch)}) ',end='')
return batch,batch
#
# ---- Not enough...
#
if self.debug: print(f'({len(batch)}..) ',end='')
#
self.read_next_cluster()
batch2 = self.data[ 0:self.batch_size-len(batch) ]
self.data_i = self.batch_size-len(batch)
batch = np.concatenate( (batch,batch2) )
#
if self.debug: print(f'(..{len(batch2)}) ',end='')
return batch, batch
def on_epoch_end(self):
self.cluster_i = self.clusters_size
self.read_next_cluster()
def read_next_cluster(self):
#
# ---- Get the next cluster name
# If we have reached the end of the list, we mix and
# start again from the beginning.
#
i = self.cluster_i + 1
if i >= self.clusters_size:
np.random.shuffle(self.clusters_name)
i = 0
if self.debug : print(f'\n[shuffle!]')
#
# ---- Read it (images still normalized)
#
data = np.load( self.clusters_name[i]+'.npy', mmap_mode='r' )
#
# ---- Remember all of that
#
self.data = data
self.data_i = 0
self.cluster_i = i
#
if self.debug: print(f'\n[Load {self.cluster_i:02d},s={len(self.data):3d}] ',end='')
@classmethod
def about(cls):
print('\nFIDLE 2020 - DataGenerator')
print('Version :', cls.version)
\ No newline at end of file
# ------------------------------------------------------------------
# _____ _ _ _
# | ___(_) __| | | ___
# | |_ | |/ _` | |/ _ \
# | _| | | (_| | | __/
# |_| |_|\__,_|_|\___|
# ------------------------------------------------------------------
# Formation Introduction au Deep Learning (FIDLE)
# CNRS/SARI/DEVLOG 2020 - S. Arias, E. Maldonado, JL. Parouty
# ------------------------------------------------------------------
# Initial version by JL Parouty, feb 2020
import numpy as np
import tensorflow as tf
import tensorflow.keras.datasets.mnist as mnist
class Loader_MNIST():
version = '0.1'
def __init__(self):
pass
@classmethod
def about(cls):
print('\nFIDLE 2020 - Very basic MNIST dataset loader)')
print('TensorFlow version :',tf.__version__)
print('Loader version :', cls.version)
@classmethod
def load(normalize=True, expand=True, verbose=1):
# ---- Get data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
if verbose>0: print('Dataset loaded.')
# ---- Normalization
if normalize:
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype( 'float32') / 255.
if verbose>0: print('Normalized.')
# ---- Reshape : (28,28) -> (28,28,1)
if expand:
x_train = np.expand_dims(x_train, axis=3)
x_test = np.expand_dims(x_test, axis=3)
if verbose>0: print(f'Reshaped to {x_train.shape}')
return (x_train,y_train),(x_test,y_test)
\ No newline at end of file