Skip to content
Snippets Groups Projects
02-Prediction.ipynb 15.2 KiB
Newer Older
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img width=\"800px\" src=\"../fidle/img/00-Fidle-header-01.svg\"></img>\n",
    "# <!-- TITLE --> [IMDB2] - Reload and reuse a saved model\n",
    "<!-- DESC --> Retrieving a saved model to perform a sentiment analysis (movie review)\n",
    "<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->\n",
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
    "\n",
    "## Objectives :\n",
    " - The objective is to guess whether film reviews are **positive or negative** based on the analysis of the text. \n",
    " - For this, we will use our **previously saved model**.\n",
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
    "\n",
    "Original dataset can be find **[there](http://ai.stanford.edu/~amaas/data/sentiment/)**  \n",
    "Note that [IMDb.com](https://imdb.com) offers several easy-to-use [datasets](https://www.imdb.com/interfaces/)  \n",
    "For simplicity's sake, we'll use the dataset directly [embedded in Keras](https://www.tensorflow.org/api_docs/python/tf/keras/datasets)\n",
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
    "\n",
    "## What we're going to do :\n",
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
    "\n",
    " - Preparing the data\n",
    " - Retrieve our saved model\n",
    " - Evaluate the result\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1 - Init python stuff"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
   "metadata": {},
     "data": {
      "text/html": [
       "<style>\n",
       "\n",
       "div.warn {    \n",
       "    background-color: #fcf2f2;\n",
       "    border-color: #dFb5b4;\n",
       "    border-left: 5px solid #dfb5b4;\n",
       "    padding: 0.5em;\n",
       "    font-weight: bold;\n",
       "    font-size: 1.1em;;\n",
       "    }\n",
       "\n",
       "\n",
       "\n",
       "div.nota {    \n",
       "    background-color: #DAFFDE;\n",
       "    border-left: 5px solid #92CC99;\n",
       "    padding: 0.5em;\n",
       "    }\n",
       "\n",
       "div.todo:before { content:url();\n",
       "    float:left;\n",
       "    margin-right:20px;\n",
       "    margin-top:-20px;\n",
       "    margin-bottom:20px;\n",
       "}\n",
       "div.todo{\n",
       "    font-weight: bold;\n",
       "    font-size: 1.1em;\n",
       "    margin-top:40px;\n",
       "}\n",
       "div.todo ul{\n",
       "    margin: 0.2em;\n",
       "}\n",
       "div.todo li{\n",
       "    margin-left:60px;\n",
       "    margin-top:0;\n",
       "    margin-bottom:0;\n",
       "}\n",
       "div .comment{\n",
       "    font-size:0.8em;\n",
       "    color:#696969;\n",
       "}\n",
       "\n",
       "\n",
       "\n",
       "</style>\n",
       "\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**FIDLE 2020 - Practical Work Module**"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Version              : 0.6.1 DEV\n",
      "Notebook id          : IMDB2\n",
      "Run time             : Friday 18 December 2020, 18:21:49\n",
      "TensorFlow version   : 2.0.0\n",
      "Keras version        : 2.2.4-tf\n",
      "Datasets dir         : /home/pjluc/datasets/fidle\n",
      "Running mode         : full\n",
      "Update keras cache   : False\n",
      "Save figs            : True\n",
      "Path figs            : ./run/figs\n"
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
   "source": [
    "import numpy as np\n",
    "\n",
    "import tensorflow as tf\n",
    "import tensorflow.keras as keras\n",
    "import tensorflow.keras.datasets.imdb as imdb\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib\n",
    "import pandas as pd\n",
    "\n",
    "import os,sys,h5py,json,re\n",
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
    "\n",
    "from importlib import reload\n",
    "\n",
    "sys.path.append('..')\n",
    "import fidle.pwk as pwk\n",
    "datasets_dir = pwk.init('IMDB2')"
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2 : Preparing the data\n",
    "### 2.1 - Our reviews :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
   "metadata": {},
   "outputs": [],
   "source": [
    "reviews = [ \"This film is particularly nice, a must see.\",\n",
    "             \"Some films are great classics and cannot be ignored.\",\n",
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
    "             \"This movie is just abominable and doesn't deserve to be seen!\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 - Retrieve dictionaries\n",
    "Note : This dictionary is generated by [01-Embedding-Keras](01-Embedding-Keras.ipynb) notebook."
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('./data/word_index.json', 'r') as fp:\n",
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
    "    word_index = json.load(fp)\n",
    "    index_word = {index:word for word,index in word_index.items()} "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3 - Clean, index and padd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
   "metadata": {},
   "outputs": [],
   "source": [
    "max_len    = 256\n",
    "vocab_size = 10000\n",
    "\n",
    "\n",
    "nb_reviews = len(reviews)\n",
    "x_data     = []\n",
    "\n",
    "# ---- For all reviews\n",
    "for review in reviews:\n",
    "    # ---- First index must be <start>\n",
    "    index_review=[1]\n",
    "    # ---- For all words\n",
    "    for w in review.split(' '):\n",
    "        # ---- Clean it\n",
    "        w_clean = re.sub(r\"[^a-zA-Z0-9]\", \"\", w)\n",
    "        # ---- Not empty ?\n",
    "        if len(w_clean)>0:\n",
    "            # ---- Get the index\n",
    "            w_index = word_index.get(w,2)\n",
    "            if w_index>vocab_size : w_index=2\n",
    "            # ---- Add the index if < vocab_size\n",
    "            index_review.append(w_index)\n",
    "    # ---- Add the indexed review\n",
    "    x_data.append(index_review)    \n",
    "\n",
    "# ---- Padding\n",
    "x_data = keras.preprocessing.sequence.pad_sequences(x_data, value   = 0, padding = 'post', maxlen  = max_len)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4 - Have a look"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Text review      : This film is particularly nice, a must see.\n",
      "x_train[0]       : [1, 2, 22, 9, 572, 2, 6, 215, 2, 0, 0, 0, 0, 0] (...)\n",
      "Translation      : <start> <unknown> film is particularly <unknown> a must <unknown> <pad> <pad> <pad> <pad> <pad> (...)\n",
      "\n",
      "Text review      : Some films are great classics and cannot be ignored.\n",
      "x_train[1]       : [1, 2, 108, 26, 87, 2239, 5, 566, 30, 2, 0, 0, 0, 0, 0] (...)\n",
      "Translation      : <start> <unknown> films are great classics and cannot be <unknown> <pad> <pad> <pad> <pad> <pad> (...)\n",
      "\n",
      "Text review      : This movie is just abominable and doesn't deserve to be seen!\n",
      "x_train[2]       : [1, 2, 20, 9, 43, 2, 5, 152, 1833, 8, 30, 2, 0, 0, 0, 0, 0] (...)\n",
      "Translation      : <start> <unknown> movie is just <unknown> and doesn't deserve to be <unknown> <pad> <pad> <pad> <pad> <pad> (...)\n"
     ]
    }
   ],
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
   "source": [
    "def translate(x):\n",
    "    return ' '.join( [index_word.get(i,'?') for i in x] )\n",
    "\n",
    "for i in range(nb_reviews):\n",
    "    imax=np.where(x_data[i]==0)[0][0]+5\n",
    "    print(f'\\nText review      :',    reviews[i])\n",
    "    print(  f'x_train[{i:}]       :', list(x_data[i][:imax]), '(...)')\n",
    "    print(  'Translation      :', translate(x_data[i][:imax]), '(...)')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2 - Bring back the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
   "metadata": {},
   "outputs": [],
   "source": [
    "model = keras.models.load_model('./run/models/best_model.h5')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4 - Predict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
   "metadata": {},
   "outputs": [],
   "source": [
    "y_pred   = model.predict(x_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### And the winner is :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "This film is particularly nice, a must see.                            => POSITIVE (0.56)\n",
      "Some films are great classics and cannot be ignored.                   => POSITIVE (0.63)\n",
      "This movie is just abominable and doesn't deserve to be seen!          => NEGATIVE (0.35)\n"
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
   "source": [
    "for i in range(nb_reviews):\n",
    "    print(f'\\n{reviews[i]:<70} =>',('NEGATIVE' if y_pred[i][0]<0.5 else 'POSITIVE'),f'({y_pred[i][0]:.2f})')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "End time is : Friday 18 December 2020, 18:21:50\n",
      "Duration is : 00:00:01 555ms\n",
      "This notebook ends here\n"
     ]
    }
   ],
   "source": [
    "pwk.end()"
   ]
  },
   "cell_type": "markdown",
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
   "metadata": {},
   "source": [
    "<img width=\"80px\" src=\"../fidle/img/00-Fidle-logo-01.svg\"></img>"
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
Jean-Luc Parouty's avatar
Jean-Luc Parouty committed
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}