Skip to content
Snippets Groups Projects
Commit 40119355 authored by Achille Mbogol Touye's avatar Achille Mbogol Touye
Browse files

Replace 01-DNN-Wine-Regression-lightning.ipynb

parent fe6377ec
No related branches found
No related tags found
1 merge request!9Dev
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
<img width="800px" src="../fidle/img/header.svg"></img> <img width="800px" src="../fidle/img/header.svg"></img>
# <!-- TITLE --> [WINE1] - Wine quality prediction with a Dense Network (DNN) using Lightning # <!-- TITLE --> [WINE1] - Wine quality prediction with a Dense Network (DNN) using Lightning
<!-- DESC --> Another example of regression, with a wine quality prediction! <!-- DESC --> Another example of regression, with a wine quality prediction!
<!-- AUTHOR : Achille Mbogol Touye (EFFILIA-MIAI/SIMaP) --> <!-- AUTHOR : Achille Mbogol Touye (EFFILIA-MIAI/SIMaP) -->
## Objectives : ## Objectives :
- Predict the **quality of wines**, based on their analysis - Predict the **quality of wines**, based on their analysis
- Understanding the principle and the architecture of a regression with a dense neural network with backup and restore of the trained model. - Understanding the principle and the architecture of a regression with a dense neural network with backup and restore of the trained model.
The **[Wine Quality datasets](https://archive.ics.uci.edu/ml/datasets/wine+Quality)** are made up of analyses of a large number of wines, with an associated quality (between 0 and 10) The **[Wine Quality datasets](https://archive.ics.uci.edu/ml/datasets/wine+Quality)** are made up of analyses of a large number of wines, with an associated quality (between 0 and 10)
This dataset is provide by : This dataset is provide by :
Paulo Cortez, University of Minho, Guimarães, Portugal, http://www3.dsi.uminho.pt/pcortez Paulo Cortez, University of Minho, Guimarães, Portugal, http://www3.dsi.uminho.pt/pcortez
A. Cerdeira, F. Almeida, T. Matos and J. Reis, Viticulture Commission of the Vinho Verde Region(CVRVV), Porto, Portugal, @2009 A. Cerdeira, F. Almeida, T. Matos and J. Reis, Viticulture Commission of the Vinho Verde Region(CVRVV), Porto, Portugal, @2009
This dataset can be retreive at [University of California Irvine (UCI)](https://archive-beta.ics.uci.edu/ml/datasets/wine+quality) This dataset can be retreive at [University of California Irvine (UCI)](https://archive-beta.ics.uci.edu/ml/datasets/wine+quality)
Due to privacy and logistic issues, only physicochemical and sensory variables are available Due to privacy and logistic issues, only physicochemical and sensory variables are available
There is no data about grape types, wine brand, wine selling price, etc. There is no data about grape types, wine brand, wine selling price, etc.
- fixed acidity - fixed acidity
- volatile acidity - volatile acidity
- citric acid - citric acid
- residual sugar - residual sugar
- chlorides - chlorides
- free sulfur dioxide - free sulfur dioxide
- total sulfur dioxide - total sulfur dioxide
- density - density
- pH - pH
- sulphates - sulphates
- alcohol - alcohol
- quality (score between 0 and 10) - quality (score between 0 and 10)
## What we're going to do : ## What we're going to do :
- (Retrieve data) - (Retrieve data)
- (Preparing the data) - (Preparing the data)
- (Build a model) - (Build a model)
- Train and save the model - Train and save the model
- Restore saved model - Restore saved model
- Evaluate the model - Evaluate the model
- Make some predictions - Make some predictions
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Step 1 - Import and init ## Step 1 - Import and init
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# Import some packages # Import some packages
import os import os
import sys import sys
import numpy as np import numpy as np
import pandas as pd import pandas as pd
import torch import torch
import torch.nn as nn import torch.nn as nn
import lightning.pytorch as pl import lightning.pytorch as pl
import torch.nn.functional as F import torch.nn.functional as F
import torchvision.transforms as T import torchvision.transforms as T
from IPython.display import Markdown
from importlib import reload from importlib import reload
from IPython.display import Markdown
from torch.utils.data import Dataset, DataLoader, random_split from torch.utils.data import Dataset, DataLoader, random_split
from modules.progressbar import CustomTrainProgressBar
from modules.data_load import WineQualityDataset, Normalize, ToTensor from modules.data_load import WineQualityDataset, Normalize, ToTensor
from lightning.pytorch.loggers.tensorboard import TensorBoardLogger from lightning.pytorch.loggers.tensorboard import TensorBoardLogger
from torchmetrics.functional.regression import mean_absolute_error, mean_squared_error from torchmetrics.functional.regression import mean_absolute_error, mean_squared_error
import fidle import fidle
# Init Fidle environment # Init Fidle environment
run_id, run_dir, datasets_dir = fidle.init('WINE1-Lightning') run_id, run_dir, datasets_dir = fidle.init('WINE1-Lightning')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Verbosity during training : Verbosity during training :
- 0 = silent - 0 = silent
- 1 = progress bar - 1 = progress bar
- 2 = one line per epoch - 2 = one line per epoch
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
fit_verbosity = 1 fit_verbosity = 1
dataset_name = 'winequality-red.csv' dataset_name = 'winequality-red.csv'
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Override parameters (batch mode) - Just forget this cell Override parameters (batch mode) - Just forget this cell
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
fidle.override('fit_verbosity', 'dataset_name') fidle.override('fit_verbosity', 'dataset_name')
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Step 2 - Retrieve data ## Step 2 - Retrieve data
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
csv_file_path=f'{datasets_dir}/WineQuality/origine/{dataset_name}' csv_file_path=f'{datasets_dir}/WineQuality/origine/{dataset_name}'
datasets=WineQualityDataset(csv_file_path) datasets=WineQualityDataset(csv_file_path)
display(datasets.data.head(5).style.format("{0:.2f}")) display(datasets.data.head(5).style.format("{0:.2f}"))
print('Missing Data : ',datasets.data.isna().sum().sum(), ' Shape is : ', datasets.data.shape) print('Missing Data : ',datasets.data.isna().sum().sum(), ' Shape is : ', datasets.data.shape)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Step 3 - Preparing the data ## Step 3 - Preparing the data
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### 3.1 - Data normalization ### 3.1 - Data normalization
**Note :** **Note :**
- All input features must be normalized. - All input features must be normalized.
- To do this we will subtract the mean and divide by the standard deviation for each input features. - To do this we will subtract the mean and divide by the standard deviation for each input features.
- Then we convert numpy array features and target **(quality)** to torch tensor - Then we convert numpy array features and target **(quality)** to torch tensor
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
transforms=T.Compose([Normalize(csv_file_path), ToTensor()]) transforms=T.Compose([Normalize(csv_file_path), ToTensor()])
dataset=WineQualityDataset(csv_file_path,transform=transforms) dataset=WineQualityDataset(csv_file_path,transform=transforms)
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
display(Markdown("before normalization :")) display(Markdown("before normalization :"))
display(datasets[:]["features"]) display(datasets[:]["features"])
print() print()
display(Markdown("After normalization :")) display(Markdown("After normalization :"))
display(dataset[:]["features"]) display(dataset[:]["features"])
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### 3.2 - Split data ### 3.2 - Split data
We will use 80% of the data for training and 20% for validation. We will use 80% of the data for training and 20% for validation.
x will be the features data of the analysis and y the target (quality) x will be the features data of the analysis and y the target (quality)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# ---- Split => train, test # ---- Split => train, test
# #
data_train_len = int(len(dataset)*0.8) # get 80 % data_train_len = int(len(dataset)*0.8) # get 80 %
data_test_len = len(dataset) -data_train_len # test = all - train data_test_len = len(dataset) -data_train_len # test = all - train
# ---- Split => x,y with random_split # ---- Split => x,y with random_split
# #
data_train_subset, data_test_subset=random_split(dataset, [data_train_len, data_test_len]) data_train_subset, data_test_subset=random_split(dataset, [data_train_len, data_test_len])
x_train = data_train_subset[:]["features"] x_train = data_train_subset[:]["features"]
y_train = data_train_subset[:]["quality" ] y_train = data_train_subset[:]["quality" ]
x_test = data_test_subset [:]["features"] x_test = data_test_subset [:]["features"]
y_test = data_test_subset [:]["quality" ] y_test = data_test_subset [:]["quality" ]
print('Original data shape was : ',dataset.data.shape) print('Original data shape was : ',dataset.data.shape)
print('x_train : ',x_train.shape, 'y_train : ',y_train.shape) print('x_train : ',x_train.shape, 'y_train : ',y_train.shape)
print('x_test : ',x_test.shape, 'y_test : ',y_test.shape) print('x_test : ',x_test.shape, 'y_test : ',y_test.shape)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### 3.3 - For Training model use Dataloader ### 3.3 - For Training model use Dataloader
The Dataset retrieves our dataset’s features and labels one sample at a time. While training a model, we typically want to pass samples in minibatches, reshuffle the data at every epoch to reduce model overfitting. DataLoader is an iterable that abstracts this complexity for us in an easy API. The Dataset retrieves our dataset’s features and labels one sample at a time. While training a model, we typically want to pass samples in minibatches, reshuffle the data at every epoch to reduce model overfitting. DataLoader is an iterable that abstracts this complexity for us in an easy API.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# train bacth data # train bacth data
train_loader= DataLoader( train_loader= DataLoader(
dataset=data_train_subset, dataset=data_train_subset,
shuffle=True, shuffle=True,
batch_size=20, batch_size=20,
num_workers=2 num_workers=2
) )
# test bacth data # test bacth data
test_loader= DataLoader( test_loader= DataLoader(
dataset=data_test_subset, dataset=data_test_subset,
shuffle=False, shuffle=False,
batch_size=20, batch_size=20,
num_workers=2 num_workers=2
) )
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Step 4 - Build a model ## Step 4 - Build a model
More informations about : More informations about :
- [Optimizer](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers) - [Optimizer](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers)
- [Activation](https://www.tensorflow.org/api_docs/python/tf/keras/activations) - [Activation](https://www.tensorflow.org/api_docs/python/tf/keras/activations)
- [Loss](https://www.tensorflow.org/api_docs/python/tf/keras/losses) - [Loss](https://www.tensorflow.org/api_docs/python/tf/keras/losses)
- [Metrics](https://www.tensorflow.org/api_docs/python/tf/keras/metrics) - [Metrics](https://www.tensorflow.org/api_docs/python/tf/keras/metrics)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
class LitRegression(pl.LightningModule): class LitRegression(pl.LightningModule):
def __init__(self,in_features=11): def __init__(self,in_features=11):
super().__init__() super().__init__()
self.model = nn.Sequential( self.model = nn.Sequential(
nn.Linear(in_features, 128), # hidden layer 1 nn.Linear(in_features, 128), # hidden layer 1
nn.ReLU(), # activation function nn.ReLU(), # activation function
nn.Linear(128, 128), # hidden layer 2 nn.Linear(128, 128), # hidden layer 2
nn.ReLU(), # activation function nn.ReLU(), # activation function
nn.Linear(128, 1)) # output layer nn.Linear(128, 1)) # output layer
def forward(self, x): # forward pass def forward(self, x): # forward pass
x = self.model(x) x = self.model(x)
return x return x
# optimizer
# optimizer
def configure_optimizers(self): def configure_optimizers(self):
optimizer = torch.optim.RMSprop(self.parameters(),lr=1e-4) optimizer = torch.optim.RMSprop(self.parameters(),lr=1e-4)
return optimizer return optimizer
def training_step(self, batch, batch_idx): def training_step(self, batch, batch_idx):
# defines the train loop. # defines the train loop.
x_features, y_target = batch["features"],batch["quality"] x_features, y_target = batch["features"],batch["quality"]
# forward pass # forward pass
y_pred = self.model(x_features) y_pred = self.model(x_features)
# loss function MSE # loss function MSE
loss = F.mse_loss(y_pred, y_target) loss = F.mse_loss(y_pred, y_target)
# metrics mae # metrics mae
mae = mean_absolute_error(y_pred,y_target) mae = mean_absolute_error(y_pred,y_target)
# metrics mse # metrics mse
mse = mean_squared_error(y_pred,y_target) mse = mean_squared_error(y_pred,y_target)
metrics= {"train_loss": loss, metrics= {"train_loss": loss,
"train_mae" : mae, "train_mae" : mae,
"train_mse" : mse "train_mse" : mse
} }
# logs metrics for each training_step # logs metrics for each training_step
self.log_dict(metrics, self.log_dict(metrics,
on_step = False, on_step = False,
on_epoch = True, on_epoch = True,
logger = True, logger = True,
prog_bar = True, prog_bar = True,
) )
return loss return loss
def validation_step(self, batch, batch_idx): def validation_step(self, batch, batch_idx):
# defines the val loop. # defines the val loop.
x_features, y_target = batch["features"],batch["quality"] x_features, y_target = batch["features"],batch["quality"]
# forward pass # forward pass
y_pred = self.model(x_features) y_pred = self.model(x_features)
# loss function MSE # loss function MSE
loss = F.mse_loss(y_pred, y_target) loss = F.mse_loss(y_pred, y_target)
# metrics # metrics
mae = mean_absolute_error(y_pred,y_target) mae = mean_absolute_error(y_pred,y_target)
# metrics # metrics
mse = mean_squared_error(y_pred,y_target) mse = mean_squared_error(y_pred,y_target)
metrics= {"val_loss": loss, metrics= {"val_loss": loss,
"val_mae" : mae, "val_mae" : mae,
"val_mse" : mse "val_mse" : mse
} }
# logs metrics for each validation_step # logs metrics for each validation_step
self.log_dict(metrics, self.log_dict(metrics,
on_step = False, on_step = False,
on_epoch = True, on_epoch = True,
logger = True, logger = True,
prog_bar = True, prog_bar = True,
) )
return metrics return metrics
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## 5 - Train the model ## 5 - Train the model
### 5.1 - Get it ### 5.1 - Get it
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
reg=LitRegression(in_features=11) reg=LitRegression(in_features=11)
print(reg) print(reg)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### 5.2 - Add callback ### 5.2 - Add callback
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
os.makedirs('./run/models', exist_ok=True) os.makedirs('./run/models', exist_ok=True)
save_dir = "./run/models/" save_dir = "./run/models/"
filename ='best-model-{epoch}-{val_loss:.2f}' filename ='best-model-{epoch}-{val_loss:.2f}'
savemodel_callback = pl.callbacks.ModelCheckpoint(dirpath=save_dir, savemodel_callback = pl.callbacks.ModelCheckpoint(dirpath=save_dir,
filename=filename, filename=filename,
save_top_k=1, save_top_k=1,
verbose=False, verbose=False,
monitor="val_loss" monitor="val_loss"
) )
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### 5.3 - Train it ### 5.3 - Train it
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# loggers data # loggers data
logger = TensorBoardLogger(save_dir='Wine_logs',name="reg_logs") logger = TensorBoardLogger(save_dir='Wine_logs',name="reg_logs")
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# train model # train model
trainer = pl.Trainer(accelerator='auto', trainer = pl.Trainer(accelerator='auto',
max_epochs=100, max_epochs=100,
logger=logger, logger=logger,
callbacks=[savemodel_callback]) num_sanity_val_steps=0,
callbacks=[savemodel_callback,CustomTrainProgressBar()])
trainer.fit(model=reg, train_dataloaders=train_loader, val_dataloaders=test_loader) trainer.fit(model=reg, train_dataloaders=train_loader, val_dataloaders=test_loader)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Step 6 - Evaluate ## Step 6 - Evaluate
### 6.1 - Model evaluation ### 6.1 - Model evaluation
MAE = Mean Absolute Error (between the labels and predictions) MAE = Mean Absolute Error (between the labels and predictions)
A mae equal to 3 represents an average error in prediction of $3k. A mae equal to 3 represents an average error in prediction of $3k.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
score=trainer.validate(model=reg, dataloaders=test_loader, verbose=False) score=trainer.validate(model=reg, dataloaders=test_loader, verbose=False)
print('x_test / loss : {:5.4f}'.format(score[0]['val_loss'])) print('x_test / loss : {:5.4f}'.format(score[0]['val_loss']))
print('x_test / mae : {:5.4f}'.format(score[0]['val_mae'])) print('x_test / mae : {:5.4f}'.format(score[0]['val_mae']))
print('x_test / mse : {:5.4f}'.format(score[0]['val_mse'])) print('x_test / mse : {:5.4f}'.format(score[0]['val_mse']))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### 6.2 - Training history ### 6.2 - Training history
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# launch Tensorboard # launch Tensorboard
%reload_ext tensorboard %reload_ext tensorboard
%tensorboard --logdir=Wine_logs/reg_logs/ %tensorboard --logdir=Wine_logs/reg_logs/ --bind_all
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Step 7 - Restore a model : ## Step 7 - Restore a model :
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### 7.1 - Reload model ### 7.1 - Reload model
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# Load the model from a checkpoint # Load the model from a checkpoint
loaded_model = LitRegression.load_from_checkpoint(savemodel_callback.best_model_path) loaded_model = LitRegression.load_from_checkpoint(savemodel_callback.best_model_path)
print("Loaded:") print("Loaded:")
print(loaded_model) print(loaded_model)
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### 7.2 - Evaluate it : ### 7.2 - Evaluate it :
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
score=trainer.validate(model=loaded_model, dataloaders=test_loader, verbose=False) score=trainer.validate(model=loaded_model, dataloaders=test_loader, verbose=False)
print('x_test / loss : {:5.4f}'.format(score[0]['val_loss'])) print('x_test / loss : {:5.4f}'.format(score[0]['val_loss']))
print('x_test / mae : {:5.4f}'.format(score[0]['val_mae'])) print('x_test / mae : {:5.4f}'.format(score[0]['val_mae']))
print('x_test / mse : {:5.4f}'.format(score[0]['val_mse'])) print('x_test / mse : {:5.4f}'.format(score[0]['val_mse']))
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### 7.3 - Make a prediction ### 7.3 - Make a prediction
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# ---- Pick n entries from our test set # ---- Pick n entries from our test set
n = 200 n = 200
ii = np.random.randint(1,len(x_test),n) ii = np.random.randint(1,len(x_test),n)
x_sample = x_test[ii] x_sample = x_test[ii]
y_sample = y_test[ii] y_sample = y_test[ii]
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# ---- Make a predictions : # ---- Make a predictions :
# Sets the model in evaluation mode. # Sets the model in evaluation mode.
loaded_model.eval() loaded_model.eval()
# Perform inference using the loaded model # Perform inference using the loaded model
y_pred = loaded_model( x_sample ) y_pred = loaded_model( x_sample )
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
# ---- Show it # ---- Show it
print('Wine Prediction Real Delta') print('Wine Prediction Real Delta')
for i in range(n): for i in range(n):
pred = y_pred[i][0].item() pred = y_pred[i][0].item()
real = y_sample[i][0].item() real = y_sample[i][0].item()
delta = real-pred delta = real-pred
print(f'{i:03d} {pred:.2f} {real} {delta:+.2f} ') print(f'{i:03d} {pred:.2f} {real} {delta:+.2f} ')
``` ```
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
fidle.end() fidle.end()
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
--- ---
<img width="80px" src="../fidle/img/logo-paysage.svg"></img> <img width="80px" src="../fidle/img/logo-paysage.svg"></img>
%% Cell type:code id: tags:
``` python
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment