Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Deep Neural Network (DNN) - BHPD dataset\n",
"========================================\n",
"---\n",
"Introduction au Deep Learning (IDLE) - S. Arias, E. Maldonado, JL. Parouty - CNRS/SARI/DEVLOG - 2020 \n",
"\n",
"## A very simple example of **regression** :\n",
"\n",
"Objective is to predicts **housing prices** from a set of house features. \n",
"\n",
"The **[Boston Housing Dataset](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html)** consists of price of houses in various places in Boston. \n",
"Alongside with price, the dataset also provide information such as Crime, areas of non-retail business in the town, \n",
"age of people who own the house and many other attributes...\n",
"\n",
"What we're going to do:\n",
"\n",
" - Retrieve data\n",
" - Preparing the data\n",
" - Build a model\n",
" - Train the model\n",
" - Evaluate the result\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1/ Init python stuff"
]
},
{
"cell_type": "code",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"IDLE 2020 - Practical Work Module\n",
" Version : 0.2.4\n",
" Run time : Sunday 2 February 2020, 15:17:29\n",
" Matplotlib style : ../fidle/talk.mplstyle\n",
" TensorFlow version : 2.0.0\n",
" Keras version : 2.2.4-tf\n"
]
}
],
"source": [
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"\n",
"from IPython.display import display, Markdown\n",
"from importlib import reload\n",
"\n",
"sys.path.append('..')\n",
"import fidle.pwk as ooo\n",
"\n",
"ooo.init()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2/ Retrieve data\n",
"\n",
"**From Keras :**\n",
"Boston housing is a famous historic dataset, so we can get it directly from [Keras datasets](https://www.tensorflow.org/api_docs/python/tf/keras/datasets) "
]
},
{
"metadata": {},
"source": [
"(x_train, y_train), (x_test, y_test) = keras.datasets.boston_housing.load_data(test_split=0.2, seed=113)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**From a csv file :** \n",
"More fun !"
]
},
{
"cell_type": "code",
"source": [
"data = pd.read_csv('./data/BostonHousing.csv', header=0)\n",
"\n",
"display(data.head(5).style.format(\"{0:.2f}\"))\n",
"print('Données manquantes : ',data.isna().sum().sum(), ' Shape is : ', data.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3/ Preparing the data\n",
"### 3.1/ Split data\n",
"We will use 70% of the data for training and 30% for validation. \n",
"x will be input data and y the expected output"
]
},
{
"cell_type": "code",
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
"source": [
"# ---- Split => train, test\n",
"#\n",
"data_train = data.sample(frac=0.7, axis=0)\n",
"data_test = data.drop(data_train.index)\n",
"\n",
"# ---- Split => x,y (medv is price)\n",
"#\n",
"x_train = data_train.drop('medv', axis=1)\n",
"y_train = data_train['medv']\n",
"x_test = data_test.drop('medv', axis=1)\n",
"y_test = data_test['medv']\n",
"\n",
"print('Original data shape was : ',data.shape)\n",
"print('x_train : ',x_train.shape, 'y_train : ',y_train.shape)\n",
"print('x_test : ',x_test.shape, 'y_test : ',y_test.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.2/ Data normalization\n",
"**Note :** \n",
" - All input data must be normalized, train and test. \n",
" - To do this we will subtract the mean and divide by the standard deviation. \n",
" - But test data should not be used in any way, even for normalization. \n",
" - The mean and the standard deviation will therefore only be calculated with the train data."
]
},
{
"cell_type": "code",
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
"source": [
"display(x_train.describe().style.format(\"{0:.2f}\").set_caption(\"Before normalization :\"))\n",
"\n",
"mean = x_train.mean()\n",
"std = x_train.std()\n",
"x_train = (x_train - mean) / std\n",
"x_test = (x_test - mean) / std\n",
"\n",
"display(x_train.describe().style.format(\"{0:.2f}\").set_caption(\"After normalization :\"))\n",
"\n",
"x_train, y_train = np.array(x_train), np.array(y_train)\n",
"x_test, y_test = np.array(x_test), np.array(y_test)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4/ Build a model\n",
"About informations about : \n",
" - [Optimizer](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers)\n",
" - [Activation](https://www.tensorflow.org/api_docs/python/tf/keras/activations)\n",
" - [Loss](https://www.tensorflow.org/api_docs/python/tf/keras/losses)\n",
" - [Metrics](https://www.tensorflow.org/api_docs/python/tf/keras/metrics)"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
" def get_model_v1(shape):\n",
" \n",
" model = keras.models.Sequential()\n",
" model.add(keras.layers.Dense(64, activation='relu', input_shape=shape))\n",
" model.add(keras.layers.Dense(64, activation='relu'))\n",
" model.add(keras.layers.Dense(1))\n",
" \n",
" model.compile(optimizer = 'rmsprop',\n",
" loss = 'mse',\n",
" metrics = ['mae', 'mse'] )\n",
" return model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5/ Train the model"
]
},
{
"cell_type": "code",
"source": [
"model=get_model_v1( (13,) )\n",
"\n",
"model.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Let's go :**"
]
},
{
"cell_type": "code",
"source": [
"history = model.fit(x_train,\n",
" y_train,\n",
" epochs = 100,\n",
" batch_size = 10,\n",
" validation_data = (x_test, y_test))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6/ Evaluate\n",
"### 6.1/ Model evaluation\n",
"MAE = Mean Absolute Error (between the labels and predictions) \n",
"A mae equal to 3 represents an average error in prediction of $3k."
]
},
{
"cell_type": "code",
"source": [
"score = model.evaluate(x_test, y_test, verbose=0)\n",
"\n",
"print('x_test / loss : {:5.4f}'.format(score[0]))\n",
"print('x_test / mae : {:5.4f}'.format(score[1]))\n",
"print('x_test / mse : {:5.4f}'.format(score[2]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 6.2/ Training history\n",
"What was the best result during our training ?"
]
},
{
"cell_type": "code",
"source": [
"\n",
"df=pd.DataFrame(data=history.history)\n",
"df.describe()"
]
},
{
"cell_type": "code",
"source": [
"print(\"min( val_mae ) : {:.4f}\".format( min(history.history[\"val_mae\"]) ) )"
]
},
{
"cell_type": "code",
"source": [
"ooo.plot_history(history, plot={'MSE' :['mse', 'val_mse'],\n",
" 'MAE' :['mae', 'val_mae'],\n",
" 'LOSS':['loss','val_loss']})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 7 - Make a prediction"
]
},
{
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": [
"my_data = [ 1.26425925, -0.48522739, 1.0436489 , -0.23112788, 1.37120745,\n",
" -2.14308942, 1.13489104, -1.06802005, 1.71189006, 1.57042287,\n",
" 0.77859951, 0.14769795, 2.7585581 ]\n",
"real_price = 10.4\n",
"\n",
"my_data=np.array(my_data).reshape(1,13)"
]
},
{
"cell_type": "code",
"source": [
"\n",
"predictions = model.predict( my_data )\n",
"print(\"Prédiction : {:.2f} K$\".format(predictions[0][0]))\n",
"print(\"Reality : {:.2f} K$\".format(real_price))"
]
},
"source": [
"---\n",
"That's all folks !"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}