Commit 3404be75 authored by Florent Chatelain's avatar Florent Chatelain
Browse files

up slides

parent ff39108a
%% Cell type:markdown id: tags:
This notebook can be run on mybinder: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Fgricad-gitlab.univ-grenoble-alpes.fr%2Fchatelaf%2Fml-sicom3a/master?filepath=notebooks%2F/1_introduction/N1_Linear_Classification.ipynb)
%% Cell type:code id: tags:
``` python
# Import modules
%matplotlib inline
import matplotlib
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
```
%% Cell type:code id: tags:
``` python
# Select random seed
random_state = 0
```
%% Cell type:markdown id: tags:
We use scikit-learn to generate a toy 2D data set (two features $x_1$ and $x_2$) for binary classification (two classes)
- each sample $(x_1,x_2)$ in the dataset is plotted as a 2D point where the two features $x_1$ and $x_2$ are displayed along the abscissa and ordinate axes respectively
- the corresponding class label $y$ is displayed as a color mark (e.g., yellow or purple)
%% Cell type:code id: tags:
``` python
from sklearn.datasets import make_classification
#X are the features (aka inputs, ...), y the labels (aka responses, targets, output...)
X,y = make_classification(n_features=2, n_redundant=0, n_informative=2, n_samples=150,
random_state=random_state, n_clusters_per_class=1)
# make the class labels y_i as +1 or -1
y[y==0]=-1
# display the dataset
plt.figure(figsize=(8,6))
plt.scatter(X[:,0], X[:,1], c=y)
plt.grid(True)
plt.xlabel('$x_1$')
plt.ylabel('$x_2$')
#plt.savefig("2d_binary_classif.pdf")
```
%% Output
Text(0, 0.5, '$x_2$')
%% Cell type:markdown id: tags:
Then, a linear model is used to learn the classification function/rule.
%% Cell type:code id: tags:
``` python
from sklearn import linear_model
# Train a linear model, namely RidgeClassifier,
# this includes standard linear regression as particular case (alpha=0)
model = linear_model.RidgeClassifier(alpha=0)
model.fit(X,y)
```
%% Output
RidgeClassifier(alpha=0)
%% Cell type:code id: tags:
``` python
# Plot the decision functions
XX, YY = np.meshgrid(np.linspace(X[:,0].min(), X[:,0].max(),200),
np.linspace(X[:,1].min(), X[:,1].max(),200))
XY = np.vstack([XX.flatten(), YY.flatten()]).T
yp = model.predict(XY)
plt.figure(figsize=(8,6))
plt.contour(XX,YY,yp.reshape(XX.shape),[0])
plt.scatter(X[:,0], X[:,1], c=y)
plt.grid("on")
plt.xlabel('$x_1$')
plt.ylabel('$x_2$')
```
%% Output
Text(0, 0.5, '$x_2$')
%% Cell type:code id: tags:
``` python
# What are the parameter values of the linear boundary equation x_2=a x_1 + b?
a = -model.coef_[0][0]/model.coef_[0][1]
b = -model.intercept_[0]/model.coef_[0][1]
print('boudary equation x_2={} x_1 + {}'.format(a,b))
```
%% Output
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-6-23b595ae4fed> in <module>
1 # What are the parameter values of the linear boundary equation x_2=a x_1 + b?
----> 2 a = -model.scoef_[0][0]/model.coef_[0][1]
3 b = -model.intercept_[0]/model.coef_[0][1]
4 print('boudary equation x_2={} x_1 + {}'.format(a,b))
AttributeError: 'RidgeClassifier' object has no attribute 'scoef_'
boudary equation x_2=-0.5596406428840415 x_1 + 0.6410047882764905
%% Cell type:markdown id: tags:
### Exercise
Change the number of informative features from `n_informative=2̀` to `n_informative=1` in the `make_classification()` procedure, regenerate the data set and fit the classification rule. Interpret now the new decision boundary: are the two variables of equal importance in predicting the class of the data?
%% Cell type:code id: tags:
``` python
#get the documentation for sklearn RidgeClassification object
linear_model.RidgeClassifier?
```
%% Output
%% Cell type:code id: tags:
``` python
```
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment