{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "<img width=\"800px\" src=\"../fidle/img/header.svg\"></img>\n", "\n", "# <!-- TITLE --> [LINR1] - Linear regression with direct resolution\n", "<!-- DESC --> Low-level implementation, using numpy, of a direct resolution for a linear regression\n", "<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->\n", "\n", "## Objectives :\n", " - Just one, the illustration of a direct resolution :-)\n", "\n", "## What we're going to do :\n", "\n", "Equation : $$Y = X.\\theta + N$$ \n", "Where N is a noise vector\n", "and $\\theta = (a,b)$ a vector as y = a.x + b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1 - Import and init" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import math\n", "import matplotlib\n", "import matplotlib.pyplot as plt\n", "import sys\n", "import fidle\n", "\n", "# Init Fidle environment\n", "run_id, run_dir, datasets_dir = fidle.init('LINR1')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2 - Retrieve a set of points" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# ---- Paramètres\n", "nb = 100 # Nombre de points\n", "xmin = 0 # Distribution / x\n", "xmax = 10\n", "a = 4 # Distribution / y\n", "b = 2 # y= a.x + b (+ bruit)\n", "noise = 7 # bruit\n", "\n", "theta = np.array([[a],[b]])\n", "\n", "# ---- Vecteur X (1,x) x nb\n", "# la premiere colonne est a 1 afin que X.theta <=> 1.b + x.a\n", "\n", "Xc1 = np.ones((nb,1))\n", "Xc2 = np.random.uniform(xmin,xmax,(nb,1))\n", "X = np.c_[ Xc1, Xc2 ]\n", "\n", "# ---- Noise\n", "# N = np.random.uniform(-noise,noise,(nb,1))\n", "N = noise * np.random.normal(0,1,(nb,1))\n", "\n", "# ---- Vecteur Y\n", "Y = (X @ theta) + N\n", "\n", "# print(\"X:\\n\",X,\"\\nY:\\n \",Y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Show it" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "width = 12\n", "height = 6\n", "\n", "fig, ax = plt.subplots()\n", "fig.set_size_inches(width,height)\n", "ax.plot(X[:,1], Y, \".\")\n", "ax.tick_params(axis='both', which='both', bottom=False, left=False, labelbottom=False, labelleft=False)\n", "ax.set_xlabel('x axis')\n", "ax.set_ylabel('y axis')\n", "fidle.scrawler.save_fig('01-set_of_points')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3 - Direct calculation of the normal equation\n", "\n", "\n", "We'll try to find an optimal value of $\\theta$, minimizing a cost function. \n", "The cost function, classically used in the case of linear regressions, is the **root mean square error** (racine carré de l'erreur quadratique moyenne): \n", "\n", "$$RMSE(X,h_\\theta)=\\sqrt{\\frac1n\\sum_{i=1}^n\\left[h_\\theta(X^{(i)})-Y^{(i)}\\right]^2}$$\n", "\n", "With the simplified variant : $$MSE(X,h_\\theta)=\\frac1n\\sum_{i=1}^n\\left[h_\\theta(X^{(i)})-Y^{(i)}\\right]^2$$\n", "\n", "The optimal value of regression is : $$\\hat{ \\theta } =( X^{T} .X)^{-1}.X^{T}.Y$$\n", "\n", "Démontstration : https://eli.thegreenplace.net/2014/derivation-of-the-normal-equation-for-linear-regression" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "theta_hat = np.linalg.inv(X.T @ X) @ X.T @ Y\n", "\n", "print(\"Theta :\\n\",theta,\"\\n\\ntheta hat :\\n\",theta_hat)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Show it" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Xd = np.array([[1,xmin], [1,xmax]])\n", "Yd = Xd @ theta_hat\n", "\n", "fig, ax = plt.subplots()\n", "fig.set_size_inches(width,height)\n", "ax.plot(X[:,1], Y, \".\")\n", "ax.plot(Xd[:,1], Yd, \"-\")\n", "ax.tick_params(axis='both', which='both', bottom=False, left=False, labelbottom=False, labelleft=False)\n", "ax.set_xlabel('x axis')\n", "ax.set_ylabel('y axis')\n", "fidle.scrawler.save_fig('02-regression-line')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fidle.end()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "<img width=\"80px\" src=\"../fidle/img/logo-paysage.svg\"></img>" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.9.2 ('fidle-env')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" }, "vscode": { "interpreter": { "hash": "b3929042cc22c1274d74e3e946c52b845b57cb6d84f2d591ffe0519b38e4896d" } } }, "nbformat": 4, "nbformat_minor": 4 }