{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "<img width=\"800px\" src=\"../fidle/img/00-Fidle-header-01.svg\"></img>\n", "\n", "# <!-- TITLE --> [IMDB3] - Reload and reuse a saved model\n", "<!-- DESC --> Retrieving a saved model to perform a sentiment analysis (movie review)\n", "<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->\n", "\n", "## Objectives :\n", " - The objective is to guess whether our personal film reviews are **positive or negative** based on the analysis of the text. \n", " - For this, we will use our **previously saved model**.\n", "\n", "## What we're going to do :\n", "\n", " - Preparing our data\n", " - Retrieve our saved model\n", " - Evaluate the result\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1 - Init python stuff" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "import tensorflow as tf\n", "import tensorflow.keras as keras\n", "import tensorflow.keras.datasets.imdb as imdb\n", "\n", "import matplotlib.pyplot as plt\n", "import matplotlib\n", "import pandas as pd\n", "\n", "import os,sys,h5py,json,re\n", "\n", "from importlib import reload\n", "\n", "import fidle\n", "\n", "# Init Fidle environment\n", "run_id, run_dir, datasets_dir = fidle.init('IMDB3')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.2 - Parameters\n", "The words in the vocabulary are classified from the most frequent to the rarest. \n", "`vocab_size` is the number of words we will remember in our vocabulary (the other words will be considered as unknown). \n", "`review_len` is the review length \n", "`saved_models` where our models were previously saved \n", "`dictionaries_dir` is where we will go to save our dictionaries. (./data is a good choice)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vocab_size = 10000\n", "review_len = 256\n", "\n", "saved_models = './run/IMDB2'\n", "dictionaries_dir = './data'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Override parameters (batch mode) - Just forget this cell" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fidle.override('vocab_size', 'review_len', 'saved_models', 'dictionaries_dir')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2 : Preparing the data\n", "### 2.1 - Our reviews :" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "reviews = [ \"This film is particularly nice, a must see.\",\n", " \"This film is a great classic that cannot be ignored.\",\n", " \"I don't remember ever having seen such a movie...\",\n", " \"This movie is just abominable and doesn't deserve to be seen!\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2 - Retrieve dictionaries\n", "Note : This dictionary is generated by [01-Embedding-Keras](01-Embedding-Keras.ipynb) notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with open(f'{dictionaries_dir}/word_index.json', 'r') as fp:\n", " word_index = json.load(fp)\n", " word_index = { w:int(i) for w,i in word_index.items() }\n", " print('Loaded. ', len(word_index), 'entries in word_index' )\n", " index_word = { i:w for w,i in word_index.items() }\n", " print('Loaded. ', len(index_word), 'entries in index_word' )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3 - Clean, index and padd\n", "Phases are split into words, punctuation is removed, sentence length is limited and padding is added... \n", "**Note** : 1 is \"Start\" and 2 is \"unknown\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nb_reviews = len(reviews)\n", "x_data = []\n", "\n", "# ---- For all reviews\n", "for review in reviews:\n", " print('Words are : ', end='')\n", " # ---- First index must be <start>\n", " index_review=[1]\n", " print('1 ', end='')\n", " # ---- For all words\n", " for w in review.split(' '):\n", " # ---- Clean it\n", " w_clean = re.sub(r\"[^a-zA-Z0-9]\", \"\", w)\n", " # ---- Not empty ?\n", " if len(w_clean)>0:\n", " # ---- Get the index\n", " w_index = word_index.get(w,2)\n", " if w_index>vocab_size : w_index=2\n", " # ---- Add the index if < vocab_size\n", " index_review.append(w_index)\n", " print(f'{w_index} ', end='')\n", " # ---- Add the indexed review\n", " x_data.append(index_review)\n", " print()\n", "\n", "# ---- Padding\n", "x_data = keras.preprocessing.sequence.pad_sequences(x_data, value = 0, padding = 'post', maxlen = review_len)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.4 - Have a look" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def translate(x):\n", " return ' '.join( [index_word.get(i,'?') for i in x] )\n", "\n", "for i in range(nb_reviews):\n", " imax=np.where(x_data[i]==0)[0][0]+5\n", " print(f'\\nText review :', reviews[i])\n", " print( f'x_train[{i:}] :', list(x_data[i][:imax]), '(...)')\n", " print( 'Translation :', translate(x_data[i][:imax]), '(...)')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3 - Bring back the model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = keras.models.load_model(f'{saved_models}/models/best_model.h5')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4 - Predict" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_pred = model.predict(x_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### And the winner is :" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for i,review in enumerate(reviews):\n", " rate = y_pred[i][0]\n", " opinion = 'NEGATIVE :-(' if rate<0.5 else 'POSITIVE :-)' \n", " print(f'{review:<70} => {rate:.2f} - {opinion}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fidle.end()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "<img width=\"80px\" src=\"../fidle/img/00-Fidle-logo-01.svg\"></img>" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.9.2 ('fidle-env')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" }, "vscode": { "interpreter": { "hash": "b3929042cc22c1274d74e3e946c52b845b57cb6d84f2d591ffe0519b38e4896d" } } }, "nbformat": 4, "nbformat_minor": 4 }