00-Numpy.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<img width=\"800px\" src=\"../fidle/img/header.svg\"></img>\n",
    "\n",
    "# <!-- TITLE --> [NP1] - A short introduction to Numpy\n",
    "<!-- DESC --> Numpy is an essential tool for the Scientific Python.\n",
    "<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->\n",
    "\n",
    "## Objectives :\n",
    " - Understand the main principles of Numpy and its potential\n",
    "\n",
    "Note : This notebook is strongly inspired by the UGA Python Introduction Course  \n",
    "See : **https://gricad-gitlab.univ-grenoble-alpes.fr/python-uga/py-training-2017**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Step 1 - Numpy the beginning\n",
    "\n",
    "Code using `numpy` usually starts with the import statement"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "NumPy provides the type `np.ndarray`. Such array are multidimensionnal sequences of homogeneous elements. They can be created for example with the commands:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# from a list\n",
    "l = [10.0, 12.5, 15.0, 17.5, 20.0]\n",
    "np.array(l)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# fast but the values can be anything\n",
    "np.empty(4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# slower than np.empty but the values are all 0.\n",
    "np.zeros([2, 6])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# multidimensional array\n",
    "a = np.ones([2, 3, 4])\n",
    "print(a.shape, a.size, a.dtype)\n",
    "a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# like range but produce 1D numpy array\n",
    "np.arange(4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# np.arange can produce arrays of floats\n",
    "np.arange(4.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# another convenient function to generate 1D arrays\n",
    "np.linspace(10, 20, 5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A NumPy array can be easily converted to a Python list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.linspace(10, 20 ,5)\n",
    "list(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Or even better\n",
    "a.tolist()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Step 2 - Access elements\n",
    "\n",
    "Elements in a `numpy` array can be accessed using indexing and slicing in any dimension. It also offers the same functionalities available in Fortan or Matlab.\n",
    "\n",
    "### 2.1 - Indexes and slices\n",
    "For example, we can create an array `A` and perform any kind of selection operations on it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A = np.random.random([4, 5])\n",
    "A"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the element from second line, first column\n",
    "A[1, 0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the first two lines\n",
    "A[:2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the last column\n",
    "A[:, -1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the first two lines and the columns with an even index\n",
    "A[:2, ::2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### 2.2 -  Using a mask to select elements validating a condition:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cond = A > 0.5\n",
    "print(cond)\n",
    "print(A[cond])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The mask is in fact a particular case of the advanced indexing capabilities provided by NumPy. For example, it is even possible to use lists for indexing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Selecting only particular columns\n",
    "print(A)\n",
    "A[:, [0, 1, 4]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Step 3 -  Perform array manipulations\n",
    "### 3.1 - Apply arithmetic operations to whole arrays (element-wise):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "(A+5)**2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 - Apply functions element-wise:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.exp(A) # With numpy arrays, use the functions from numpy !"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.3 - Setting parts of arrays"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A[:, 0] = 0.\n",
    "print(A)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# BONUS: Safe element-wise inverse with masks\n",
    "cond = (A != 0)\n",
    "A[cond] = 1./A[cond]\n",
    "print(A)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Step 4 - Attributes and methods of `np.ndarray` (see the [doc](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for i,v in enumerate([s for s in dir(A) if not s.startswith('__')]):\n",
    "    print(f'{v:16}', end='')\n",
    "    if (i+1) % 6 == 0 :print('')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Ex1: Get the mean through different dimensions\n",
    "\n",
    "print(A)\n",
    "print('Mean value',  A.mean())\n",
    "print('Mean line',   A.mean(axis=0))\n",
    "print('Mean column', A.mean(axis=1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Ex2: Convert a 2D array in 1D keeping all elements\n",
    "\n",
    "print(A)\n",
    "print(A.shape)\n",
    "A_flat = A.flatten()\n",
    "print(A_flat, A_flat.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### 4.1 - Remark: dot product"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b = np.linspace(0, 10, 11)\n",
    "c = b @ b\n",
    "# before 3.5:\n",
    "# c = b.dot(b)\n",
    "print(b)\n",
    "print(c)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2 -  For Matlab users\n",
    "\n",
    "|     ` `       | Matlab | Numpy |\n",
    "| ------------- | ------ | ----- |\n",
    "| element wise  |  `.*`  |  `*`  |\n",
    "|  dot product  |  `*`   |  `@`  |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`numpy` arrays can also be sorted, even when they are composed of complex data if the type of the columns are explicitly stated with `dtypes`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### 4.3 -  NumPy and SciPy sub-packages:\n",
    "\n",
    "We already saw `numpy.random` to generate `numpy` arrays filled with random values. This submodule also provides functions related to distributions (Poisson, gaussian, etc.) and permutations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To perform linear algebra with dense matrices, we can use the submodule `numpy.linalg`. For instance, in order to compute the determinant of a random matrix, we use the method `det`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A = np.random.random([5,5])\n",
    "print(A)\n",
    "np.linalg.det(A)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "squared_subA = A[1:3, 1:3]\n",
    "print(squared_subA)\n",
    "np.linalg.inv(squared_subA)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### 4.4 -  Introduction to Pandas: Python Data Analysis Library\n",
    "\n",
    "Pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for Python.\n",
    "\n",
    "[Pandas tutorial](https://pandas.pydata.org/pandas-docs/stable/10min.html)\n",
    "[Grenoble Python Working Session](https://github.com/iutzeler/Pres_Pandas/)\n",
    "[Pandas for SQL Users](http://sergilehkyi.com/translating-sql-to-pandas/)\n",
    "[Pandas Introduction Training HPC Python@UGA](https://gricad-gitlab.univ-grenoble-alpes.fr/python-uga/training-hpc/-/blob/master/ipynb/11_pandas.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "<img width=\"80px\" src=\"../fidle/img/logo-paysage.svg\"></img>"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Diaporama",
  "kernelspec": {
   "display_name": "Python 3.9.2 ('fidle-env')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.2"
  },
  "vscode": {
   "interpreter": {
    "hash": "b3929042cc22c1274d74e3e946c52b845b57cb6d84f2d591ffe0519b38e4896d"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}