Skip to content
Snippets Groups Projects
Commit c39243e6 authored by Jean-Luc Parouty's avatar Jean-Luc Parouty
Browse files

Minor correction in K3IMDB3

parent 0ad729fb
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
<img width="800px" src="../fidle/img/header.svg"></img>
# <!-- TITLE --> [K3IMDB3] - Reload and reuse a saved model
<!-- DESC --> Retrieving a saved model to perform a sentiment analysis (movie review), using Keras 3 and PyTorch
<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->
## Objectives :
- The objective is to guess whether our personal film reviews are **positive or negative** based on the analysis of the text.
- For this, we will use our **previously saved model**.
## What we're going to do :
- Preparing our data
- Retrieve our saved model
- Evaluate the result
%% Cell type:markdown id: tags:
## Step 1 - Init python stuff
%% Cell type:code id: tags:
``` python
import os
os.environ['KERAS_BACKEND'] = 'torch'
import keras
import json,re
import numpy as np
import fidle
# Init Fidle environment
run_id, run_dir, datasets_dir = fidle.init('K3IMDB3')
```
%% Cell type:markdown id: tags:
### 1.2 - Parameters
The words in the vocabulary are classified from the most frequent to the rarest.
`vocab_size` is the number of words we will remember in our vocabulary (the other words will be considered as unknown).
`review_len` is the review length
`saved_models` where our models were previously saved
`dictionaries_dir` is where we will go to save our dictionaries. (./data is a good choice)
%% Cell type:code id: tags:
``` python
vocab_size = 10000
review_len = 256
saved_models = './run/K3IMDB2'
dictionaries_dir = './data'
```
%% Cell type:markdown id: tags:
Override parameters (batch mode) - Just forget this cell
%% Cell type:code id: tags:
``` python
fidle.override('vocab_size', 'review_len', 'saved_models', 'dictionaries_dir')
```
%% Cell type:markdown id: tags:
## Step 2 : Preparing the data
### 2.1 - Our reviews :
%% Cell type:code id: tags:
``` python
reviews = [ "This film is particularly nice, a must see.",
"This film is a great classic that cannot be ignored.",
"I don't remember ever having seen such a movie...",
"This movie is just abominable and doesn't deserve to be seen!"]
```
%% Cell type:markdown id: tags:
### 2.2 - Retrieve dictionaries
Note : This dictionary is generated by [02-Embedding-Keras](02-Keras-embedding.ipynb) notebook.
%% Cell type:code id: tags:
``` python
with open(f'{dictionaries_dir}/word_index.json', 'r') as fp:
word_index = json.load(fp)
index_word = { i:w for w,i in word_index.items() }
print('Dictionaries loaded. ', len(word_index), 'entries' )
```
%% Cell type:markdown id: tags:
### 2.3 - Clean, index and padd
Phases are split into words, punctuation is removed, sentence length is limited and padding is added...
**Note** : 1 is "Start" and 2 is "unknown"
%% Cell type:code id: tags:
``` python
start_char = 1 # Start of a sequence (padding is 0)
oov_char = 2 # Out-of-vocabulary
index_from = 3 # First word id
nb_reviews = len(reviews)
x_data = []
# ---- For all reviews
for review in reviews:
print('Words are : ', end='')
# ---- First index must be <start>
index_review=[start_char]
print(f'{start_char} ', end='')
# ---- For all words
for w in review.split(' '):
# ---- Clean it
w_clean = re.sub(r"[^a-zA-Z0-9]", "", w)
# ---- Not empty ?
if len(w_clean)>0:
# ---- Get the index - must be inside dict or is out of vocab (oov)
w_index = word_index.get(w, oov_char)
if w_index>vocab_size : w_index=oov_char
# ---- Add the index if < vocab_size
index_review.append(w_index)
print(f'{w_index} ', end='')
# ---- Add the indexed review
x_data.append(index_review)
print()
# ---- Padding
x_data = keras.preprocessing.sequence.pad_sequences(x_data, value = 0, padding = 'post', maxlen = review_len)
```
%% Cell type:markdown id: tags:
### 2.4 - Have a look
%% Cell type:code id: tags:
``` python
def translate(x):
return ' '.join( [index_word.get(i,'?') for i in x] )
for i in range(nb_reviews):
imax=np.where(x_data[i]==0)[0][0]+5
print(f'\nText review {i} :', reviews[i])
print(f'tokens vector :', list(x_data[i][:imax]), '(...)')
print('Translation :', translate(x_data[i][:imax]), '(...)')
```
%% Cell type:markdown id: tags:
## Step 3 - Bring back the model
%% Cell type:code id: tags:
``` python
model = keras.models.load_model(f'{saved_models}/models/best_model.keras')
```
%% Cell type:markdown id: tags:
## Step 4 - Predict
%% Cell type:code id: tags:
``` python
y_pred = model.predict(x_data, verbose=fit_verbosity)
y_pred = model.predict(x_data, verbose=0)
```
%% Cell type:markdown id: tags:
#### And the winner is :
%% Cell type:code id: tags:
``` python
for i,review in enumerate(reviews):
rate = y_pred[i][0]
opinion = 'NEGATIVE :-(' if rate<0.5 else 'POSITIVE :-)'
print(f'{review:<70} => {rate:.2f} - {opinion}')
```
%% Cell type:code id: tags:
``` python
fidle.end()
```
%% Cell type:markdown id: tags:
---
<img width="80px" src="../fidle/img/logo-paysage.svg"></img>
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment