correct mispell word

2895200f · Soraya Arias · b5bc9e3c · 2895200f
Commit 2895200f authored 4 years ago by Soraya Arias
--- a/IMDB/01-Embedding-Keras.ipynb
+++ b/IMDB/01-Embedding-Keras.ipynb
@@ -6,7 +6,7 @@
   "source": [
    "<img width=\"800px\" src=\"../fidle/img/00-Fidle-header-01.svg\"></img>\n",
    "\n",
-    "# <!-- TITLE --> [IMDB1] - Sentiment alalysis with text embedding\n",
+    "# <!-- TITLE --> [IMDB1] - Sentiment analysis with text embedding\n",
    "<!-- DESC --> A very classical example of word embedding with a dataset from Internet Movie Database (IMDB)\n",
    "<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->\n",
    "\n",

 %% Cell type:markdown id: tags:
 <img width="800px" src="../fidle/img/00-Fidle-header-01.svg"></img>
-# <!-- TITLE --> [IMDB1] - Sentiment alalysis with text embedding
+# <!-- TITLE --> [IMDB1] - Sentiment analysis with text embedding
 <!-- DESC --> A very classical example of word embedding with a dataset from Internet Movie Database (IMDB)
 <!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->
 ## Objectives :
 - The objective is to guess whether film reviews are **positive or negative** based on the analysis of the text.
 - Understand the management of **textual data** and **sentiment analysis**
 Original dataset can be find **[there](http://ai.stanford.edu/~amaas/data/sentiment/)**
 Note that [IMDb.com](https://imdb.com) offers several easy-to-use [datasets](https://www.imdb.com/interfaces/)
 For simplicity's sake, we'll use the dataset directly [embedded in Keras](https://www.tensorflow.org/api_docs/python/tf/keras/datasets)
 ## What we're going to do :
 - Retrieve data
 - Preparing the data
 - Build a model
 - Train the model
 - Evaluate the result
 %% Cell type:markdown id: tags:
 ## Step 1 - Init python stuff
 %% Cell type:code id: tags:
 ``` python
 import numpy as np
 import tensorflow as tf
 import tensorflow.keras as keras
 import tensorflow.keras.datasets.imdb as imdb
 import matplotlib.pyplot as plt
 import matplotlib
 import os,sys,h5py,json
 from importlib import reload
 sys.path.append('..')
 import fidle.pwk as pwk
 datasets_dir = pwk.init('IMDB1')
 ```
 %% Output
    **FIDLE 2020 - Practical Work Module**
    Version              : 0.6.1 DEV
    Notebook id          : IMDB1
    Run time             : Friday 18 December 2020, 17:52:06
    TensorFlow version   : 2.0.0
    Keras version        : 2.2.4-tf
    Datasets dir         : /home/pjluc/datasets/fidle
    Running mode         : full
    Update keras cache   : False
    Save figs            : True
    Path figs            : ./run/figs
 %% Cell type:markdown id: tags:
 ## Step 2 - Retrieve data
 IMDb dataset can bet get directly from Keras - see [documentation](https://www.tensorflow.org/api_docs/python/tf/keras/datasets)
 Note : Due to their nature, textual data can be somewhat complex.
 ### 2.1 - Data structure :
 The dataset is composed of 2 parts:
 - **reviews**, this will be our **x**
 - **opinions** (positive/negative), this will be our **y**
 There are also a **dictionary**, because words are indexed in reviews
 ```
 <dataset> = (<reviews>, <opinions>)
 with :  <reviews>  = [ <review1>, <review2>, ... ]
        <opinions> = [ <rate1>,   <rate2>,   ... ]   where <ratei>   = integer
 where : <reviewi> = [ <w1>, <w2>, ...]    <wi> are the index (int) of the word in the dictionary
        <ratei>   = int                   0 for negative opinion, 1 for positive
 <dictionary> = [ <word1>:<w1>, <word2>:<w2>, ... ]
 with :  <wordi>   = word
        <wi>      = int
 ```
 %% Cell type:markdown id: tags:
 ### 2.2 - Get dataset
 For simplicity, we will use a pre-formatted dataset - See [documentation](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/imdb/load_data)
 However, Keras offers some usefull tools for formatting textual data - See [documentation](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text)
 **Load dataset :**
 %% Cell type:code id: tags:
 ``` python
 vocab_size = 10000
 # ----- Retrieve x,y
 # Uncomment this if you want to load dataset directly from keras (small size <20M)
 #
 (x_train, y_train), (x_test, y_test) = imdb.load_data( num_words  = vocab_size,
                                                       skip_top   = 0,
                                                       maxlen     = None,
                                                       seed       = 42,
                                                       start_char = 1,
                                                       oov_char   = 2,
                                                       index_from = 3, )
 # To load a h5 version of the dataset :
 #
 # with  h5py.File(f'{datasets_dir}/IMDB/origine/dataset_imdb.h5','r') as f:
 #        x_train = f['x_train'][:]
 #        y_train = f['y_train'][:]
 #        x_test  = f['x_test'][:]
 #        y_test  = f['y_test'][:]
 ```
 %% Output
    /home/pjluc/anaconda3/envs/fidle/lib/python3.7/site-packages/tensorflow_core/python/keras/datasets/imdb.py:129: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
      x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
    /home/pjluc/anaconda3/envs/fidle/lib/python3.7/site-packages/tensorflow_core/python/keras/datasets/imdb.py:130: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
      x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])
 %% Cell type:markdown id: tags:
 **About this dataset :**
 %% Cell type:code id: tags:
 ``` python
 print("  Max(x_train,x_test)  : ", pwk.rmax([x_train,x_test]) )
 print("  x_train : {}  y_train : {}".format(x_train.shape, y_train.shape))
 print("  x_test  : {}  y_test  : {}".format(x_test.shape,  y_test.shape))
 print('\nReview example (x_train[12]) :\n\n',x_train[12])
 ```
 %% Output
      Max(x_train,x_test)  :  9999
      x_train : (25000,)  y_train : (25000,)
      x_test  : (25000,)  y_test  : (25000,)
    Review example (x_train[12]) :
     [1, 14, 22, 1367, 53, 206, 159, 4, 636, 898, 74, 26, 11, 436, 363, 108, 7, 14, 432, 14, 22, 9, 1055, 34, 8599, 2, 5, 381, 3705, 4509, 14, 768, 47, 839, 25, 111, 1517, 2579, 1991, 438, 2663, 587, 4, 280, 725, 6, 58, 11, 2714, 201, 4, 206, 16, 702, 5, 5176, 19, 480, 5920, 157, 13, 64, 219, 4, 2, 11, 107, 665, 1212, 39, 4, 206, 4, 65, 410, 16, 565, 5, 24, 43, 343, 17, 5602, 8, 169, 101, 85, 206, 108, 8, 3008, 14, 25, 215, 168, 18, 6, 2579, 1991, 438, 2, 11, 129, 1609, 36, 26, 66, 290, 3303, 46, 5, 633, 115, 4363]
 %% Cell type:markdown id: tags:
 ### 2.3 - Have a look for humans (optional)
 When we loaded the dataset, we asked for using \<start\> as 1, \<unknown word\> as 2
 So, we shifted the dataset by 3 with the parameter index_from=3
 **Load dictionary :**
 %% Cell type:code id: tags:
 ``` python
 # ---- Retrieve dictionary {word:index}, and encode it in ascii
 #
 word_index = imdb.get_word_index()
 # ---- Shift the dictionary from +3
 #
 word_index = {w:(i+3) for w,i in word_index.items()}
 # ---- Add <pad>, <start> and unknown tags
 #
 word_index.update( {'<pad>':0, '<start>':1, '<unknown>':2} )
 # ---- Create a reverse dictionary : {index:word}
 #
 index_word = {index:word for word,index in word_index.items()}
 # ---- Add a nice function to transpose :
 #
 def dataset2text(review):
    return ' '.join([index_word.get(i, '?') for i in review])
 ```
 %% Cell type:markdown id: tags:
 **Have a look :**
 %% Cell type:code id: tags:
 ``` python
 print('\nDictionary size     : ', len(word_index))
 for k in range(440,455):print(f'{k:2d} : {index_word[k]}' )
 pwk.subtitle('Review example :')
 print(x_train[12])
 pwk.subtitle('After translation :')
 print(dataset2text(x_train[12]))
 ```
 %% Output
    Dictionary size     :  88587
    440 : hope
    441 : entertaining
    442 : she's
    443 : mr
    444 : overall
    445 : evil
    446 : called
    447 : loved
    448 : based
    449 : oh
    450 : several
    451 : fans
    452 : mother
    453 : drama
    454 : beginning
    <br>**Review example :**
    [1, 14, 22, 1367, 53, 206, 159, 4, 636, 898, 74, 26, 11, 436, 363, 108, 7, 14, 432, 14, 22, 9, 1055, 34, 8599, 2, 5, 381, 3705, 4509, 14, 768, 47, 839, 25, 111, 1517, 2579, 1991, 438, 2663, 587, 4, 280, 725, 6, 58, 11, 2714, 201, 4, 206, 16, 702, 5, 5176, 19, 480, 5920, 157, 13, 64, 219, 4, 2, 11, 107, 665, 1212, 39, 4, 206, 4, 65, 410, 16, 565, 5, 24, 43, 343, 17, 5602, 8, 169, 101, 85, 206, 108, 8, 3008, 14, 25, 215, 168, 18, 6, 2579, 1991, 438, 2, 11, 129, 1609, 36, 26, 66, 290, 3303, 46, 5, 633, 115, 4363]
    <br>**After translation :**
    <start> this film contains more action before the opening credits than are in entire hollywood films of this sort this film is produced by tsui <unknown> and stars jet li this team has brought you many worthy hong kong cinema productions including the once upon a time in china series the action was fast and furious with amazing wire work i only saw the <unknown> in two shots aside from the action the story itself was strong and not just used as filler to find any other action films to rival this you must look for a hong kong cinema <unknown> in your area they are really worth checking out and usually never disappoint
 %% Cell type:markdown id: tags:
 ### 2.4 - Have a look for NN
 %% Cell type:code id: tags:
 ``` python
 sizes=[len(i) for i in x_train]
 plt.figure(figsize=(16,6))
 plt.hist(sizes, bins=400)
 plt.gca().set(title='Distribution of reviews by size - [{:5.2f}, {:5.2f}]'.format(min(sizes),max(sizes)),
              xlabel='Size', ylabel='Density', xlim=[0,1500])
 pwk.save_fig('01-stats-sizes')
 plt.show()
 ```
 %% Output
 %% Cell type:markdown id: tags:
 ## Step 3 - Preprocess the data (padding)
 In order to be processed by an NN, all entries must have the **same length.**
 We chose a review length of **review_len**
 We will therefore complete them with a padding (of \<pad\>\)
 %% Cell type:code id: tags:
 ``` python
 review_len = 256
 x_train = keras.preprocessing.sequence.pad_sequences(x_train,
                                                     value   = 0,
                                                     padding = 'post',
                                                     maxlen  = review_len)
 x_test  = keras.preprocessing.sequence.pad_sequences(x_test,
                                                     value   = 0 ,
                                                     padding = 'post',
                                                     maxlen  = review_len)
 pwk.subtitle('After padding :')
 print(x_train[12])
 pwk.subtitle('In real words :')
 print(dataset2text(x_train[12]))
 ```
 %% Output
    <br>**After padding :**
    [   1   14   22 1367   53  206  159    4  636  898   74   26   11  436
      363  108    7   14  432   14   22    9 1055   34 8599    2    5  381
     3705 4509   14  768   47  839   25  111 1517 2579 1991  438 2663  587
        4  280  725    6   58   11 2714  201    4  206   16  702    5 5176
       19  480 5920  157   13   64  219    4    2   11  107  665 1212   39
        4  206    4   65  410   16  565    5   24   43  343   17 5602    8
      169  101   85  206  108    8 3008   14   25  215  168   18    6 2579
     1991  438    2   11  129 1609   36   26   66  290 3303   46    5  633
      115 4363    0    0    0    0    0    0    0    0    0    0    0    0
        0    0    0    0    0    0    0    0    0    0    0    0    0    0
        0    0    0    0    0    0    0    0    0    0    0    0    0    0
        0    0    0    0    0    0    0    0    0    0    0    0    0    0
        0    0    0    0    0    0    0    0    0    0    0    0    0    0
        0    0    0    0    0    0    0    0    0    0    0    0    0    0
        0    0    0    0    0    0    0    0    0    0    0    0    0    0
        0    0    0    0    0    0    0    0    0    0    0    0    0    0
        0    0    0    0    0    0    0    0    0    0    0    0    0    0
        0    0    0    0    0    0    0    0    0    0    0    0    0    0
        0    0    0    0]
    <br>**In real words :**
    <start> this film contains more action before the opening credits than are in entire hollywood films of this sort this film is produced by tsui <unknown> and stars jet li this team has brought you many worthy hong kong cinema productions including the once upon a time in china series the action was fast and furious with amazing wire work i only saw the <unknown> in two shots aside from the action the story itself was strong and not just used as filler to find any other action films to rival this you must look for a hong kong cinema <unknown> in your area they are really worth checking out and usually never disappoint <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad>
 %% Cell type:markdown id: tags:
 **Save dataset and dictionary (For future use but not mandatory)**
 %% Cell type:code id: tags:
 ``` python
 # ---- Write dataset in a h5 file, could be usefull
 #
 output_dir = './data'
 pwk.mkdir(output_dir)
 with h5py.File(f'{output_dir}/dataset_imdb.h5', 'w') as f:
    f.create_dataset("x_train",    data=x_train)
    f.create_dataset("y_train",    data=y_train)
    f.create_dataset("x_test",     data=x_test)
    f.create_dataset("y_test",     data=y_test)
 with open(f'{output_dir}/word_index.json', 'w') as fp:
    json.dump(word_index, fp)
 with open(f'{output_dir}/index_word.json', 'w') as fp:
    json.dump(index_word, fp)
 print('Saved.')
 ```
 %% Output
    Saved.
 %% Cell type:markdown id: tags:
 ## Step 4 - Build the model
 Few remarks :
 - We'll choose a dense vector size for the embedding output with **dense_vector_size**
 - **GlobalAveragePooling1D** do a pooling on the last dimension : (None, lx, ly) -> (None, ly)
   In other words: we average the set of vectors/words of a sentence
 - L'embedding de Keras fonctionne de manière supervisée. Il s'agit d'une couche de *vocab_size* neurones vers *n_neurons* permettant de maintenir une table de vecteurs (les poids constituent les vecteurs). Cette couche ne calcule pas de sortie a la façon des couches normales, mais renvois la valeur des vecteurs. n mots => n vecteurs (ensuite empilés par le pooling)
 Voir : [Explication plus détaillée (en)](https://stats.stackexchange.com/questions/324992/how-the-embedding-layer-is-trained-in-keras-embedding-layer)
 ainsi que : [Sentiment detection with Keras](https://www.liip.ch/en/blog/sentiment-detection-with-keras-word-embeddings-and-lstm-deep-learning-networks)
 More documentation about this model functions :
 - [Embedding](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding)
 - [GlobalAveragePooling1D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalAveragePooling1D)
 %% Cell type:code id: tags:
 ``` python
 def get_model(dense_vector_size=32):
    model = keras.Sequential()
    model.add(keras.layers.Embedding(input_dim    = vocab_size,
                                     output_dim   = dense_vector_size,
                                     input_length = review_len))
    model.add(keras.layers.GlobalAveragePooling1D())
    model.add(keras.layers.Dense(dense_vector_size, activation='relu'))
    model.add(keras.layers.Dense(1,                 activation='sigmoid'))
    model.compile(optimizer = 'adam',
                  loss      = 'binary_crossentropy',
                  metrics   = ['accuracy'])
    return model
 ```
 %% Cell type:markdown id: tags:
 ## Step 5 - Train the model
 ### 5.1 - Get it
 %% Cell type:code id: tags:
 ``` python
 model = get_model(32)
 model.summary()
 ```
 %% Output
    Model: "sequential"
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #
    =================================================================
    embedding (Embedding)        (None, 256, 32)           320000
    _________________________________________________________________
    global_average_pooling1d (Gl (None, 32)                0
    _________________________________________________________________
    dense (Dense)                (None, 32)                1056
    _________________________________________________________________
    dense_1 (Dense)              (None, 1)                 33
    =================================================================
    Total params: 321,089
    Trainable params: 321,089
    Non-trainable params: 0
    _________________________________________________________________
 %% Cell type:markdown id: tags:
 ### 5.2 - Add callback
 %% Cell type:code id: tags:
 ``` python
 os.makedirs('./run/models',   mode=0o750, exist_ok=True)
 save_dir = "./run/models/best_model.h5"
 savemodel_callback = tf.keras.callbacks.ModelCheckpoint(filepath=save_dir, verbose=0, save_best_only=True)
 ```
 %% Cell type:markdown id: tags:
 ### 5.1 - Train it
 %% Cell type:code id: tags:
 ``` python
 %%time
 n_epochs   = 30
 batch_size = 512
 history = model.fit(x_train,
                    y_train,
                    epochs          = n_epochs,
                    batch_size      = batch_size,
                    validation_data = (x_test, y_test),
                    verbose         = 1,
                    callbacks       = [savemodel_callback])
 ```
 %% Output
    Train on 25000 samples, validate on 25000 samples
    Epoch 1/30
    25000/25000 [==============================] - 1s 57us/sample - loss: 0.6881 - accuracy: 0.6431 - val_loss: 0.6782 - val_accuracy: 0.7408
    Epoch 2/30
    25000/25000 [==============================] - 1s 32us/sample - loss: 0.6506 - accuracy: 0.7618 - val_loss: 0.6168 - val_accuracy: 0.7650
    Epoch 3/30
    25000/25000 [==============================] - 1s 31us/sample - loss: 0.5590 - accuracy: 0.8060 - val_loss: 0.5135 - val_accuracy: 0.8203
    Epoch 4/30
    25000/25000 [==============================] - 1s 31us/sample - loss: 0.4460 - accuracy: 0.8512 - val_loss: 0.4198 - val_accuracy: 0.8487
    Epoch 5/30
    25000/25000 [==============================] - 1s 31us/sample - loss: 0.3606 - accuracy: 0.8758 - val_loss: 0.3636 - val_accuracy: 0.8609
    Epoch 6/30
    25000/25000 [==============================] - 1s 31us/sample - loss: 0.3074 - accuracy: 0.8894 - val_loss: 0.3322 - val_accuracy: 0.8676
    Epoch 7/30
    25000/25000 [==============================] - 1s 31us/sample - loss: 0.2719 - accuracy: 0.9016 - val_loss: 0.3127 - val_accuracy: 0.8734
    Epoch 8/30
    25000/25000 [==============================] - 1s 31us/sample - loss: 0.2457 - accuracy: 0.9103 - val_loss: 0.3007 - val_accuracy: 0.8765
    Epoch 9/30
    25000/25000 [==============================] - 1s 30us/sample - loss: 0.2257 - accuracy: 0.9178 - val_loss: 0.2933 - val_accuracy: 0.8795
    Epoch 10/30
    25000/25000 [==============================] - 1s 30us/sample - loss: 0.2083 - accuracy: 0.9249 - val_loss: 0.2888 - val_accuracy: 0.8818
    Epoch 11/30
    25000/25000 [==============================] - 1s 30us/sample - loss: 0.1938 - accuracy: 0.9310 - val_loss: 0.2874 - val_accuracy: 0.8824
    Epoch 12/30
    25000/25000 [==============================] - 1s 31us/sample - loss: 0.1818 - accuracy: 0.9352 - val_loss: 0.2867 - val_accuracy: 0.8826
    Epoch 13/30
    25000/25000 [==============================] - 1s 30us/sample - loss: 0.1703 - accuracy: 0.9406 - val_loss: 0.2879 - val_accuracy: 0.8828
    Epoch 14/30
    25000/25000 [==============================] - 1s 30us/sample - loss: 0.1605 - accuracy: 0.9451 - val_loss: 0.2922 - val_accuracy: 0.8815
    Epoch 15/30
    25000/25000 [==============================] - 1s 29us/sample - loss: 0.1520 - accuracy: 0.9480 - val_loss: 0.2945 - val_accuracy: 0.8828
    Epoch 16/30
    25000/25000 [==============================] - 1s 30us/sample - loss: 0.1435 - accuracy: 0.9524 - val_loss: 0.2986 - val_accuracy: 0.8821
    Epoch 17/30
    25000/25000 [==============================] - 1s 30us/sample - loss: 0.1359 - accuracy: 0.9551 - val_loss: 0.3042 - val_accuracy: 0.8803
    Epoch 18/30
    25000/25000 [==============================] - 1s 29us/sample - loss: 0.1290 - accuracy: 0.9581 - val_loss: 0.3100 - val_accuracy: 0.8783
    Epoch 19/30
    25000/25000 [==============================] - 1s 30us/sample - loss: 0.1223 - accuracy: 0.9609 - val_loss: 0.3169 - val_accuracy: 0.8772
    Epoch 20/30
    25000/25000 [==============================] - 1s 29us/sample - loss: 0.1167 - accuracy: 0.9632 - val_loss: 0.3245 - val_accuracy: 0.8747
    Epoch 21/30
    25000/25000 [==============================] - 1s 30us/sample - loss: 0.1116 - accuracy: 0.9654 - val_loss: 0.3318 - val_accuracy: 0.8756
    Epoch 22/30
    25000/25000 [==============================] - 1s 29us/sample - loss: 0.1064 - accuracy: 0.9670 - val_loss: 0.3409 - val_accuracy: 0.8743
    Epoch 23/30
    25000/25000 [==============================] - 1s 29us/sample - loss: 0.1007 - accuracy: 0.9704 - val_loss: 0.3478 - val_accuracy: 0.8728
    Epoch 24/30
    25000/25000 [==============================] - 1s 30us/sample - loss: 0.0963 - accuracy: 0.9718 - val_loss: 0.3577 - val_accuracy: 0.8718
    Epoch 25/30
    25000/25000 [==============================] - 1s 30us/sample - loss: 0.0914 - accuracy: 0.9743 - val_loss: 0.3728 - val_accuracy: 0.8670
    Epoch 26/30
    25000/25000 [==============================] - 1s 29us/sample - loss: 0.0887 - accuracy: 0.9744 - val_loss: 0.3746 - val_accuracy: 0.8697
    Epoch 27/30
    25000/25000 [==============================] - 1s 29us/sample - loss: 0.0842 - accuracy: 0.9777 - val_loss: 0.3835 - val_accuracy: 0.8683
    Epoch 28/30
    25000/25000 [==============================] - 1s 30us/sample - loss: 0.0796 - accuracy: 0.9788 - val_loss: 0.3937 - val_accuracy: 0.8670
    Epoch 29/30
    25000/25000 [==============================] - 1s 30us/sample - loss: 0.0763 - accuracy: 0.9804 - val_loss: 0.4032 - val_accuracy: 0.8658
    Epoch 30/30
    25000/25000 [==============================] - 1s 30us/sample - loss: 0.0729 - accuracy: 0.9819 - val_loss: 0.4156 - val_accuracy: 0.8648
    CPU times: user 1min 42s, sys: 7.41 s, total: 1min 50s
    Wall time: 23.4 s
 %% Cell type:markdown id: tags:
 ## Step 6 - Evaluate
 ### 6.1 - Training history
 %% Cell type:code id: tags:
 ``` python
 pwk.plot_history(history, save_as='02-history')
 ```
 %% Output
 %% Cell type:markdown id: tags:
 ### 6.2 - Reload and evaluate best model
 %% Cell type:code id: tags:
 ``` python
 model = keras.models.load_model('./run/models/best_model.h5')
 # ---- Evaluate
 score  = model.evaluate(x_test, y_test, verbose=0)
 print('x_test / loss      : {:5.4f}'.format(score[0]))
 print('x_test / accuracy  : {:5.4f}'.format(score[1]))
 values=[score[1], 1-score[1]]
 pwk.plot_donut(values,["Accuracy","Errors"], title="#### Accuracy donut is :", save_as='03-donut')
 # ---- Confusion matrix
 y_sigmoid = model.predict(x_test)
 y_pred = y_sigmoid.copy()
 y_pred[ y_sigmoid< 0.5 ] = 0
 y_pred[ y_sigmoid>=0.5 ] = 1
 pwk.display_confusion_matrix(y_test,y_pred,labels=range(2))
 pwk.plot_confusion_matrix(y_test,y_pred,range(2), figsize=(8, 8),normalize=False, save_as='04-confusion-matrix')
 ```
 %% Output
    x_test / loss      : 0.2867
    x_test / accuracy  : 0.8826
    #### Accuracy donut is :
    #### Confusion matrix is :
 %% Cell type:code id: tags:
 ``` python
 pwk.end()
 ```
 %% Output
    End time is : Friday 18 December 2020, 17:52:43
    Duration is : 00:00:37 585ms
    This notebook ends here
 %% Cell type:markdown id: tags:
 ---
 <img width="80px" src="../fidle/img/00-Fidle-logo-01.svg"></img>