Newer
Older
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img width=\"800px\" src=\"../fidle/img/00-Fidle-header-01.svg\"></img>\n",
"# <!-- TITLE --> [IMDB3] - Text embedding/LSTM model with IMDB\n",
"<!-- DESC --> Still the same problem, but with a network combining embedding and LSTM\n",
"<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->\n",
"## Objectives :\n",
" - The objective is to guess whether film reviews are **positive or negative** based on the analysis of the text. \n",
" - Use of a model combining embedding and LSTM\n",
"\n",
"Original dataset can be find **[there](http://ai.stanford.edu/~amaas/data/sentiment/)** \n",
"Note that [IMDb.com](https://imdb.com) offers several easy-to-use [datasets](https://www.imdb.com/interfaces/) \n",
"For simplicity's sake, we'll use the dataset directly [embedded in Keras](https://www.tensorflow.org/api_docs/python/tf/keras/datasets)\n",
"\n",
"## What we're going to do :\n",
"\n",
" - Retrieve data\n",
" - Preparing the data\n",
" - Build a Embedding/LSTM model\n",
" - Train the model\n",
" - Evaluate the result\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1 - Init python stuff"
]
},
{
"cell_type": "code",
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
"outputs": [
{
"data": {
"text/html": [
"<style>\n",
"\n",
"div.warn { \n",
" background-color: #fcf2f2;\n",
" border-color: #dFb5b4;\n",
" border-left: 5px solid #dfb5b4;\n",
" padding: 0.5em;\n",
" font-weight: bold;\n",
" font-size: 1.1em;;\n",
" }\n",
"\n",
"\n",
"\n",
"div.nota { \n",
" background-color: #DAFFDE;\n",
" border-left: 5px solid #92CC99;\n",
" padding: 0.5em;\n",
" }\n",
"\n",
"div.todo:before { content:url();\n",
" float:left;\n",
" margin-right:20px;\n",
" margin-top:-20px;\n",
" margin-bottom:20px;\n",
"}\n",
"div.todo{\n",
" font-weight: bold;\n",
" font-size: 1.1em;\n",
" margin-top:40px;\n",
"}\n",
"div.todo ul{\n",
" margin: 0.2em;\n",
"}\n",
"div.todo li{\n",
" margin-left:60px;\n",
" margin-top:0;\n",
" margin-bottom:0;\n",
"}\n",
"\n",
"\n",
"</style>\n",
"\n"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"FIDLE 2020 - Practical Work Module\n",
"Version : 0.57 DEV\n",
"Run time : Thursday 10 September 2020, 16:39:27\n",
"TensorFlow version : 2.2.0\n",
"Keras version : 2.3.0-tf\n",
"Current place : Fidle at IDRIS\n",
"Dataset dir : /gpfswork/rech/mlh/commun/datasets\n",
"Update keras cache : Done\n"
]
}
],
"source": [
"import numpy as np\n",
"\n",
"import tensorflow as tf\n",
"import tensorflow.keras as keras\n",
"import tensorflow.keras.datasets.imdb as imdb\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import matplotlib\n",
"import seaborn as sns\n",
"\n",
"import os,sys,h5py,json\n",
"\n",
"from importlib import reload\n",
"\n",
"sys.path.append('..')\n",
"import fidle.pwk as ooo\n",
"\n",
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2 - Retrieve data\n",
"\n",
"**From Keras :**\n",
"This IMDb dataset can bet get directly from [Keras datasets](https://www.tensorflow.org/api_docs/python/tf/keras/datasets) \n",
"\n",
"Due to their nature, textual data can be somewhat complex.\n",
"\n",
"### 2.1 - Data structure : \n",
"The dataset is composed of 2 parts: **reviews** and **opinions** (positive/negative), with a **dictionary**\n",
"\n",
" - dataset = (reviews, opinions)\n",
" - reviews = \\[ review_0, review_1, ...\\]\n",
" - review_i = [ int1, int2, ...] where int_i is the index of the word in the dictionary.\n",
" - opinions = \\[ int0, int1, ...\\] where int_j == 0 if opinion is negative or 1 if opinion is positive.\n",
" - dictionary = \\[ mot1:int1, mot2:int2, ... ]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.2 - Get dataset\n",
"For simplicity, we will use a pre-formatted dataset. \n",
"See : https://www.tensorflow.org/api_docs/python/tf/keras/datasets/imdb/load_data \n",
"\n",
"However, Keras offers some usefull tools for formatting textual data. \n",
"See : https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"vocab_size = 10000\n",
"\n",
"# ----- Retrieve x,y\n",
"#\n",
"# Choose if you want to load dataset directly from keras (small size <20M)\n",
"(x_train, y_train), (x_test, y_test) = imdb.load_data( num_words = vocab_size,\n",
" skip_top = 0,\n",
" maxlen = None,\n",
" seed = 42,\n",
" start_char = 1,\n",
" oov_char = 2,\n",
" index_from = 3, )\n",
"# Or you can use the same pre-loaded dataset if at GRICAD or IDRIS\n",
"#place, dataset_dir = ooo.good_place( { 'GRICAD' : f'{os.getenv(\"SCRATCH_DIR\",\"\")}/PROJECTS/pr-fidle/datasets/IMDB',\n",
"# 'IDRIS' : f'{os.getenv(\"WORK\",\"\")}/datasets/IMDB',\n",
"# 'HOME' : f'{os.getenv(\"HOME\",\"\")}/datasets/IMDB'} )\n",
"#with h5py.File(f'{dataset_dir}/dataset_imdb.h5','r') as f:\n",
"# x_train = f['x_train'][:]\n",
"# y_train = f['y_train'][:]\n",
"# x_test = f['x_test'][:]\n",
"# y_test = f['y_test'][:]\n"
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Max(x_train,x_test) : 9999\n",
" x_train : (25000,) y_train : (25000,)\n",
" x_test : (25000,) y_test : (25000,)\n",
"\n",
"Review example (x_train[12]) :\n",
"\n",
" [1, 14, 22, 1367, 53, 206, 159, 4, 636, 898, 74, 26, 11, 436, 363, 108, 7, 14, 432, 14, 22, 9, 1055, 34, 8599, 2, 5, 381, 3705, 4509, 14, 768, 47, 839, 25, 111, 1517, 2579, 1991, 438, 2663, 587, 4, 280, 725, 6, 58, 11, 2714, 201, 4, 206, 16, 702, 5, 5176, 19, 480, 5920, 157, 13, 64, 219, 4, 2, 11, 107, 665, 1212, 39, 4, 206, 4, 65, 410, 16, 565, 5, 24, 43, 343, 17, 5602, 8, 169, 101, 85, 206, 108, 8, 3008, 14, 25, 215, 168, 18, 6, 2579, 1991, 438, 2, 11, 129, 1609, 36, 26, 66, 290, 3303, 46, 5, 633, 115, 4363]\n"
]
}
],
"source": [
"print(\" Max(x_train,x_test) : \", ooo.rmax([x_train,x_test]) )\n",
"print(\" x_train : {} y_train : {}\".format(x_train.shape, y_train.shape))\n",
"print(\" x_test : {} y_test : {}\".format(x_test.shape, y_test.shape))\n",
"\n",
"print('\\nReview example (x_train[12]) :\\n\\n',x_train[12])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.3 - Have a look for humans (optional)\n",
"When we loaded the dataset, we asked for using \\<start\\> as 1, \\<unknown word\\> as 2 \n",
"So, we shifted the dataset by 3 with the parameter index_from=3"
]
},
{
"cell_type": "code",
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
"metadata": {},
"outputs": [],
"source": [
"# ---- Retrieve dictionary {word:index}, and encode it in ascii\n",
"\n",
"word_index = imdb.get_word_index()\n",
"\n",
"# ---- Shift the dictionary from +3\n",
"\n",
"word_index = {w:(i+3) for w,i in word_index.items()}\n",
"\n",
"# ---- Add <pad>, <start> and unknown tags\n",
"\n",
"word_index.update( {'<pad>':0, '<start>':1, '<unknown>':2} )\n",
"\n",
"# ---- Create a reverse dictionary : {index:word}\n",
"\n",
"index_word = {index:word for word,index in word_index.items()} \n",
"\n",
"# ---- Add a nice function to transpose :\n",
"#\n",
"def dataset2text(review):\n",
" return ' '.join([index_word.get(i, '?') for i in review])"
]
},
{
"cell_type": "code",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Dictionary size : 88587\n",
"\n",
"Review example (x_train[12]) :\n",
"\n",
" [1, 14, 22, 1367, 53, 206, 159, 4, 636, 898, 74, 26, 11, 436, 363, 108, 7, 14, 432, 14, 22, 9, 1055, 34, 8599, 2, 5, 381, 3705, 4509, 14, 768, 47, 839, 25, 111, 1517, 2579, 1991, 438, 2663, 587, 4, 280, 725, 6, 58, 11, 2714, 201, 4, 206, 16, 702, 5, 5176, 19, 480, 5920, 157, 13, 64, 219, 4, 2, 11, 107, 665, 1212, 39, 4, 206, 4, 65, 410, 16, 565, 5, 24, 43, 343, 17, 5602, 8, 169, 101, 85, 206, 108, 8, 3008, 14, 25, 215, 168, 18, 6, 2579, 1991, 438, 2, 11, 129, 1609, 36, 26, 66, 290, 3303, 46, 5, 633, 115, 4363]\n",
"\n",
"In real words :\n",
"\n",
" <start> this film contains more action before the opening credits than are in entire hollywood films of this sort this film is produced by tsui <unknown> and stars jet li this team has brought you many worthy hong kong cinema productions including the once upon a time in china series the action was fast and furious with amazing wire work i only saw the <unknown> in two shots aside from the action the story itself was strong and not just used as filler to find any other action films to rival this you must look for a hong kong cinema <unknown> in your area they are really worth checking out and usually never disappoint\n"
]
}
],
"source": [
"print('\\nDictionary size : ', len(word_index))\n",
"print('\\nReview example (x_train[12]) :\\n\\n',x_train[12])\n",
"print('\\nIn real words :\\n\\n', dataset2text(x_train[12]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.4 - Have a look for neurons"
]
},
{
"cell_type": "code",
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 864x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(12, 6))\n",
"ax=sns.histplot([len(i) for i in x_train],bins=60)\n",
"ax.set_title('Distribution of reviews by size')\n",
"plt.xlabel(\"Review's sizes\")\n",
"plt.ylabel('Density')\n",
"ax.set_xlim(0, 1500)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3 - Preprocess the data\n",
"In order to be processed by an NN, all entries must have the same length. \n",
"We chose a review length of **review_len** \n",
"We will therefore complete them with a padding (of \\<pad\\>\\) "
]
},
{
"cell_type": "code",
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Review example (x_train[12]) :\n",
"\n",
" [ 1 14 22 1367 53 206 159 4 636 898 74 26 11 436\n",
" 363 108 7 14 432 14 22 9 1055 34 8599 2 5 381\n",
" 3705 4509 14 768 47 839 25 111 1517 2579 1991 438 2663 587\n",
" 4 280 725 6 58 11 2714 201 4 206 16 702 5 5176\n",
" 19 480 5920 157 13 64 219 4 2 11 107 665 1212 39\n",
" 4 206 4 65 410 16 565 5 24 43 343 17 5602 8\n",
" 169 101 85 206 108 8 3008 14 25 215 168 18 6 2579\n",
" 1991 438 2 11 129 1609 36 26 66 290 3303 46 5 633\n",
" 115 4363 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0]\n",
"\n",
"In real words :\n",
"\n",
" <start> this film contains more action before the opening credits than are in entire hollywood films of this sort this film is produced by tsui <unknown> and stars jet li this team has brought you many worthy hong kong cinema productions including the once upon a time in china series the action was fast and furious with amazing wire work i only saw the <unknown> in two shots aside from the action the story itself was strong and not just used as filler to find any other action films to rival this you must look for a hong kong cinema <unknown> in your area they are really worth checking out and usually never disappoint <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad> <pad>\n"
]
}
],
"source": [
"review_len = 256\n",
"\n",
"x_train = keras.preprocessing.sequence.pad_sequences(x_train,\n",
" value = 0,\n",
" padding = 'post',\n",
" maxlen = review_len)\n",
"\n",
"x_test = keras.preprocessing.sequence.pad_sequences(x_test,\n",
" value = 0 ,\n",
" padding = 'post',\n",
" maxlen = review_len)\n",
"\n",
"print('\\nReview example (x_train[12]) :\\n\\n',x_train[12])\n",
"print('\\nIn real words :\\n\\n', dataset2text(x_train[12]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Save dataset and dictionary (can be usefull but not mandatory if at GRICAD or IDRIS)"
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Saved.\n"
]
}
],
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
"source": [
"os.makedirs('./data', mode=0o750, exist_ok=True)\n",
"\n",
"with h5py.File('./data/dataset_imdb.h5', 'w') as f:\n",
" f.create_dataset(\"x_train\", data=x_train)\n",
" f.create_dataset(\"y_train\", data=y_train)\n",
" f.create_dataset(\"x_test\", data=x_test)\n",
" f.create_dataset(\"y_test\", data=y_test)\n",
"\n",
"with open('./data/word_index.json', 'w') as fp:\n",
" json.dump(word_index, fp)\n",
"\n",
"with open('./data/index_word.json', 'w') as fp:\n",
" json.dump(index_word, fp)\n",
"\n",
"print('Saved.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4 - Build the model\n",
"Few remarks :\n",
"1. We'll choose a dense vector size for the embedding output with **dense_vector_size**\n",
"2. **GlobalAveragePooling1D** do a pooling on the last dimension : (None, lx, ly) -> (None, ly) \n",
"In other words: we average the set of vectors/words of a sentence\n",
"3. L'embedding de Keras fonctionne de manière supervisée. Il s'agit d'une couche de *vocab_size* neurones vers *n_neurons* permettant de maintenir une table de vecteurs (les poids constituent les vecteurs). Cette couche ne calcule pas de sortie a la façon des couches normales, mais renvois la valeur des vecteurs. n mots => n vecteurs (ensuite empilés par le pooling) \n",
"Voir : https://stats.stackexchange.com/questions/324992/how-the-embedding-layer-is-trained-in-keras-embedding-layer\n",
"\n",
"A SUIVRE : https://www.liip.ch/en/blog/sentiment-detection-with-keras-word-embeddings-and-lstm-deep-learning-networks\n",
"### 4.1 - Build\n",
"More documentation about :\n",
" - [Embedding](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding)\n",
" - [GlobalAveragePooling1D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalAveragePooling1D)"
]
},
{
"cell_type": "code",
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
"metadata": {},
"outputs": [],
"source": [
"def get_model(dense_vector_size=128):\n",
" \n",
" model = keras.Sequential()\n",
" model.add(keras.layers.Embedding(input_dim = vocab_size, \n",
" output_dim = dense_vector_size, \n",
" input_length = review_len))\n",
" model.add(keras.layers.LSTM(128, dropout=0.2, recurrent_dropout=0.2))\n",
" model.add(keras.layers.Dense(1, activation='sigmoid'))\n",
"\n",
" model.compile(optimizer = 'adam',\n",
" loss = 'binary_crossentropy',\n",
" metrics = ['accuracy'])\n",
" return model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5 - Train the model\n",
"### 5.1 - Get it"
]
},
{
"cell_type": "code",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU\n",
"Model: \"sequential\"\n",
"_________________________________________________________________\n",
"Layer (type) Output Shape Param # \n",
"=================================================================\n",
"embedding (Embedding) (None, 256, 128) 1280000 \n",
"_________________________________________________________________\n",
"lstm (LSTM) (None, 128) 131584 \n",
"_________________________________________________________________\n",
"dense (Dense) (None, 1) 129 \n",
"=================================================================\n",
"Total params: 1,411,713\n",
"Trainable params: 1,411,713\n",
"Non-trainable params: 0\n",
"_________________________________________________________________\n"
]
}
],
"source": [
"model = get_model()\n",
"\n",
"model.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.2 - Add callback"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"os.makedirs('./run/models', mode=0o750, exist_ok=True)\n",
"save_dir = \"./run/models/best_model.h5\"\n",
"savemodel_callback = tf.keras.callbacks.ModelCheckpoint(filepath=save_dir, verbose=0, save_best_only=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5.1 - Train it\n",
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/10\n",
"49/49 [==============================] - 37s 758ms/step - loss: 0.6913 - accuracy: 0.5230 - val_loss: 0.6871 - val_accuracy: 0.5364\n",
"Epoch 2/10\n",
"49/49 [==============================] - 37s 754ms/step - loss: 0.6752 - accuracy: 0.5610 - val_loss: 0.6785 - val_accuracy: 0.5394\n",
"Epoch 3/10\n",
"49/49 [==============================] - 37s 757ms/step - loss: 0.6375 - accuracy: 0.5919 - val_loss: 0.7197 - val_accuracy: 0.5734\n",
"Epoch 4/10\n",
"49/49 [==============================] - 38s 776ms/step - loss: 0.4910 - accuracy: 0.7900 - val_loss: 0.8804 - val_accuracy: 0.6802\n",
"Epoch 5/10\n",
"49/49 [==============================] - 37s 764ms/step - loss: 0.5418 - accuracy: 0.7373 - val_loss: 0.4988 - val_accuracy: 0.7914\n",
"Epoch 6/10\n",
"49/49 [==============================] - 38s 774ms/step - loss: 0.4422 - accuracy: 0.8261 - val_loss: 0.4817 - val_accuracy: 0.8059\n",
"Epoch 7/10\n",
"49/49 [==============================] - 38s 780ms/step - loss: 0.4258 - accuracy: 0.8367 - val_loss: 0.5204 - val_accuracy: 0.7898\n",
"Epoch 8/10\n",
"49/49 [==============================] - 38s 767ms/step - loss: 0.4136 - accuracy: 0.8440 - val_loss: 0.5543 - val_accuracy: 0.7444\n",
"Epoch 9/10\n",
"49/49 [==============================] - 39s 805ms/step - loss: 0.4160 - accuracy: 0.8400 - val_loss: 0.4742 - val_accuracy: 0.8111\n",
"Epoch 10/10\n",
"49/49 [==============================] - 39s 789ms/step - loss: 0.3917 - accuracy: 0.8554 - val_loss: 0.4772 - val_accuracy: 0.8118\n",
"CPU times: user 11min 43s, sys: 51.1 s, total: 12min 34s\n",
"Wall time: 6min 25s\n"
]
}
],
"source": [
"%%time\n",
"\n",
"n_epochs = 10\n",
"\n",
"history = model.fit(x_train,\n",
" y_train,\n",
" epochs = n_epochs,\n",
" batch_size = batch_size,\n",
" validation_data = (x_test, y_test),\n",
" verbose = 1,\n",
" callbacks = [savemodel_callback])\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 6 - Evaluate\n",
"### 6.1 - Training history"
]
},
{
"cell_type": "code",
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 576x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 576x432 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"ooo.plot_history(history)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 6.2 - Reload and evaluate best model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING:tensorflow:Layer lstm will not use cuDNN kernel since it doesn't meet the cuDNN kernel criteria. It will use generic GPU kernel as fallback when running on GPU\n",
"x_test / loss : 0.4742\n",
"x_test / accuracy : 0.8111\n"
]
},
{
"data": {
"text/markdown": [
"#### Accuracy donut is :"
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x432 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"model = keras.models.load_model('./run/models/best_model.h5')\n",
"\n",
"# ---- Evaluate\n",
"score = model.evaluate(x_test, y_test, verbose=0)\n",
"\n",
"print('x_test / loss : {:5.4f}'.format(score[0]))\n",
"print('x_test / accuracy : {:5.4f}'.format(score[1]))\n",
"\n",
"values=[score[1], 1-score[1]]\n",
"ooo.plot_donut(values,[\"Accuracy\",\"Errors\"], title=\"#### Accuracy donut is :\")\n",
"\n",
"# ---- Confusion matrix\n",
"\n",
"#y_pred = model.predict_classes(x_test) Deprecated after 01/01/2021 !!\n",
"\n",
"y_sigmoid = model.predict(x_test)\n",
"y_pred = np.argmax(y_sigmoid, axis=-1)\n",
"\n",
"ooo.display_confusion_matrix(y_test,y_pred,labels=range(2),color='orange',font_size='20pt')\n"
]
},
{
"<img width=\"80px\" src=\"../fidle/img/00-Fidle-logo-01.svg\"></img>"
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
}
},
"nbformat": 4,
"nbformat_minor": 4
}