Newer
Older
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img width=\"800px\" src=\"../fidle/img/header.svg\"></img>\n",
"# <!-- TITLE --> [LINR1] - Linear regression with direct resolution\n",
"<!-- DESC --> Low-level implementation, using numpy, of a direct resolution for a linear regression\n",
"<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->\n",
"\n",
"## Objectives :\n",
" - Just one, the illustration of a direct resolution :-)\n",
"\n",
"## What we're going to do :\n",
"\n",
"Where N is a noise vector\n",
"and $\\theta = (a,b)$ a vector as y = a.x + b"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1 - Import and init"
]
},
{
"cell_type": "code",
"source": [
"import numpy as np\n",
"import math\n",
"import matplotlib\n",
"import matplotlib.pyplot as plt\n",
"import sys\n",
"# Init Fidle environment\n",
"run_id, run_dir, datasets_dir = fidle.init('LINR1')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2 - Retrieve a set of points"
]
},
{
"cell_type": "code",
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
"metadata": {},
"outputs": [],
"source": [
"# ---- Paramètres\n",
"nb = 100 # Nombre de points\n",
"xmin = 0 # Distribution / x\n",
"xmax = 10\n",
"a = 4 # Distribution / y\n",
"b = 2 # y= a.x + b (+ bruit)\n",
"noise = 7 # bruit\n",
"\n",
"theta = np.array([[a],[b]])\n",
"\n",
"# ---- Vecteur X (1,x) x nb\n",
"# la premiere colonne est a 1 afin que X.theta <=> 1.b + x.a\n",
"\n",
"Xc1 = np.ones((nb,1))\n",
"Xc2 = np.random.uniform(xmin,xmax,(nb,1))\n",
"X = np.c_[ Xc1, Xc2 ]\n",
"\n",
"# ---- Noise\n",
"# N = np.random.uniform(-noise,noise,(nb,1))\n",
"N = noise * np.random.normal(0,1,(nb,1))\n",
"\n",
"# ---- Vecteur Y\n",
"Y = (X @ theta) + N\n",
"\n",
"# print(\"X:\\n\",X,\"\\nY:\\n \",Y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Show it"
]
},
{
"cell_type": "code",
"source": [
"width = 12\n",
"height = 6\n",
"\n",
"fig, ax = plt.subplots()\n",
"fig.set_size_inches(width,height)\n",
"ax.plot(X[:,1], Y, \".\")\n",
"ax.tick_params(axis='both', which='both', bottom=False, left=False, labelbottom=False, labelleft=False)\n",
"ax.set_xlabel('x axis')\n",
"ax.set_ylabel('y axis')\n",
"fidle.scrawler.save_fig('01-set_of_points')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3 - Direct calculation of the normal equation\n",
"\n",
"\n",
"We'll try to find an optimal value of $\\theta$, minimizing a cost function. \n",
"The cost function, classically used in the case of linear regressions, is the **root mean square error** (racine carré de l'erreur quadratique moyenne): \n",
"\n",
"$$RMSE(X,h_\\theta)=\\sqrt{\\frac1n\\sum_{i=1}^n\\left[h_\\theta(X^{(i)})-Y^{(i)}\\right]^2}$$\n",
"With the simplified variant : $$MSE(X,h_\\theta)=\\frac1n\\sum_{i=1}^n\\left[h_\\theta(X^{(i)})-Y^{(i)}\\right]^2$$\n",
"The optimal value of regression is : $$\\hat{ \\theta } =( X^{T} .X)^{-1}.X^{T}.Y$$\n",
"\n",
"Démontstration : https://eli.thegreenplace.net/2014/derivation-of-the-normal-equation-for-linear-regression"
]
},
{
"cell_type": "code",
"source": [
"theta_hat = np.linalg.inv(X.T @ X) @ X.T @ Y\n",
"\n",
"print(\"Theta :\\n\",theta,\"\\n\\ntheta hat :\\n\",theta_hat)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Show it"
]
},
{
"cell_type": "code",
"source": [
"Xd = np.array([[1,xmin], [1,xmax]])\n",
"Yd = Xd @ theta_hat\n",
"\n",
"fig, ax = plt.subplots()\n",
"fig.set_size_inches(width,height)\n",
"ax.plot(X[:,1], Y, \".\")\n",
"ax.plot(Xd[:,1], Yd, \"-\")\n",
"ax.tick_params(axis='both', which='both', bottom=False, left=False, labelbottom=False, labelleft=False)\n",
"ax.set_xlabel('x axis')\n",
"ax.set_ylabel('y axis')\n",
"fidle.scrawler.save_fig('02-regression-line')\n",
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"<img width=\"80px\" src=\"../fidle/img/logo-paysage.svg\"></img>"
]
}
],
"metadata": {
"kernelspec": {
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.2"
},
"vscode": {
"interpreter": {
"hash": "b3929042cc22c1274d74e3e946c52b845b57cb6d84f2d591ffe0519b38e4896d"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}