{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# A short introduction to Numpy\n", "Strongly inspired by the UGA Python Introduction Course\n", "https://gricad-gitlab.univ-grenoble-alpes.fr/python-uga/py-training-2017" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## A short introduction on NumPy\n", "\n", "Code using `numpy` usually starts with the import statement" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy provides the type `np.ndarray`. Such array are multidimensionnal sequences of homogeneous elements. They can be created for example with the commands:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([10. , 12.5, 15. , 17.5, 20. ])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# from a list\n", "l = [10.0, 12.5, 15.0, 17.5, 20.0]\n", "np.array(l)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1.27880790e-316, 0.00000000e+000, 6.91986808e-310, 1.57378525e-316])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# fast but the values can be anything\n", "np.empty(4)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "array([[0., 0., 0., 0., 0., 0.],\n", " [0., 0., 0., 0., 0., 0.]])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# slower than np.empty but the values are all 0.\n", "np.zeros([2, 6])" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2, 3, 4) 24 float64\n" ] }, { "data": { "text/plain": [ "array([[[1., 1., 1., 1.],\n", " [1., 1., 1., 1.],\n", " [1., 1., 1., 1.]],\n", "\n", " [[1., 1., 1., 1.],\n", " [1., 1., 1., 1.],\n", " [1., 1., 1., 1.]]])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# multidimensional array\n", "a = np.ones([2, 3, 4])\n", "print(a.shape, a.size, a.dtype)\n", "a" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# like range but produce 1D numpy array\n", "np.arange(4)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0., 1., 2., 3.])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# np.arange can produce arrays of floats\n", "np.arange(4.)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([10. , 12.5, 15. , 17.5, 20. ])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# another convenient function to generate 1D arrays\n", "np.linspace(10, 20, 5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A NumPy array can be easily converted to a Python list." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[10.0, 12.5, 15.0, 17.5, 20.0]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.linspace(10, 20 ,5)\n", "list(a)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[10.0, 12.5, 15.0, 17.5, 20.0]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Or even better\n", "a.tolist()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Manipulating NumPy arrays" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Access elements\n", "Elements in a `numpy` array can be accessed using indexing and slicing in any dimension. It also offers the same functionalities available in Fortan or Matlab.\n", "\n", "### Indexes and slices\n", "For example, we can create an array `A` and perform any kind of selection operations on it." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.89925962, 0.31519992, 0.17170063, 0.06102236, 0.6055506 ],\n", " [0.43365108, 0.67461267, 0.34962124, 0.75648088, 0.53096922],\n", " [0.65643503, 0.4723704 , 0.77202087, 0.50192904, 0.14067726],\n", " [0.80709755, 0.2314217 , 0.65465368, 0.28459125, 0.54727527]])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = np.random.random([4, 5])\n", "A" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.4336510750584107" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the element from second line, first column\n", "A[1, 0]" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.89925962, 0.31519992, 0.17170063, 0.06102236, 0.6055506 ],\n", " [0.43365108, 0.67461267, 0.34962124, 0.75648088, 0.53096922]])" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the first two lines\n", "A[:2]" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.6055506 , 0.53096922, 0.14067726, 0.54727527])" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the last column\n", "A[:, -1]" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.89925962, 0.17170063, 0.6055506 ],\n", " [0.43365108, 0.34962124, 0.53096922]])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the first two lines and the columns with an even index\n", "A[:2, ::2]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Using a mask to select elements validating a condition:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ True False False False True]\n", " [False True False True True]\n", " [ True False True True False]\n", " [ True False True False True]]\n", "[0.89925962 0.6055506 0.67461267 0.75648088 0.53096922 0.65643503\n", " 0.77202087 0.50192904 0.80709755 0.65465368 0.54727527]\n" ] } ], "source": [ "cond = A > 0.5\n", "print(cond)\n", "print(A[cond])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The mask is in fact a particular case of the advanced indexing capabilities provided by NumPy. For example, it is even possible to use lists for indexing:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0.89925962 0.31519992 0.17170063 0.06102236 0.6055506 ]\n", " [0.43365108 0.67461267 0.34962124 0.75648088 0.53096922]\n", " [0.65643503 0.4723704 0.77202087 0.50192904 0.14067726]\n", " [0.80709755 0.2314217 0.65465368 0.28459125 0.54727527]]\n" ] }, { "data": { "text/plain": [ "array([[0.89925962, 0.31519992, 0.6055506 ],\n", " [0.43365108, 0.67461267, 0.53096922],\n", " [0.65643503, 0.4723704 , 0.14067726],\n", " [0.80709755, 0.2314217 , 0.54727527]])" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Selecting only particular columns\n", "print(A)\n", "A[:, [0, 1, 4]]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Perform array manipulations\n", "### Apply arithmetic operations to whole arrays (element-wise):" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[34.80126403, 28.25135024, 26.7464874 , 25.61394735, 31.42219749],\n", " [29.52456401, 32.20122896, 28.61844741, 33.13707212, 30.59162046],\n", " [31.99525724, 29.94683782, 33.31622493, 30.27122313, 26.42656267],\n", " [33.72238198, 27.36777304, 31.97510827, 27.92690466, 30.77226288]])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(A+5)**2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Apply functions element-wise:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[2.45778274, 1.37053329, 1.18732233, 1.06292268, 1.83226077],\n", " [1.54288042, 1.9632724 , 1.41853016, 2.13076459, 1.70057974],\n", " [1.92790714, 1.60379132, 2.16413527, 1.65190478, 1.15105309],\n", " [2.24139301, 1.26039064, 1.92447592, 1.3292186 , 1.72853679]])" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.exp(A) # With numpy arrays, use the functions from numpy !" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setting parts of arrays" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0. 0.31519992 0.17170063 0.06102236 0.6055506 ]\n", " [0. 0.67461267 0.34962124 0.75648088 0.53096922]\n", " [0. 0.4723704 0.77202087 0.50192904 0.14067726]\n", " [0. 0.2314217 0.65465368 0.28459125 0.54727527]]\n" ] } ], "source": [ "A[:, 0] = 0.\n", "print(A)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 3.17258959 5.82409047 16.387435 1.65138967]\n", " [ 0. 1.48233207 2.86023812 1.32191048 1.88334836]\n", " [ 0. 2.11698277 1.29530177 1.99231351 7.10846954]\n", " [ 0. 4.32111589 1.5275252 3.51381149 1.82723405]]\n" ] } ], "source": [ "# BONUS: Safe element-wise inverse with masks\n", "cond = (A != 0)\n", "A[cond] = 1./A[cond]\n", "print(A)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Attributes and methods of `np.ndarray` (see the [doc](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray))" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['T', 'all', 'any', 'argmax', 'argmin', 'argpartition', 'argsort', 'astype', 'base', 'byteswap', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'ctypes', 'cumprod', 'cumsum', 'data', 'diagonal', 'dot', 'dtype', 'dump', 'dumps', 'fill', 'flags', 'flat', 'flatten', 'getfield', 'imag', 'item', 'itemset', 'itemsize', 'max', 'mean', 'min', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'partition', 'prod', 'ptp', 'put', 'ravel', 'real', 'repeat', 'reshape', 'resize', 'round', 'searchsorted', 'setfield', 'setflags', 'shape', 'size', 'sort', 'squeeze', 'std', 'strides', 'sum', 'swapaxes', 'take', 'tobytes', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'var', 'view']\n" ] } ], "source": [ "print([s for s in dir(A) if not s.startswith('__')])" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 3.17258959 5.82409047 16.387435 1.65138967]\n", " [ 0. 1.48233207 2.86023812 1.32191048 1.88334836]\n", " [ 0. 2.11698277 1.29530177 1.99231351 7.10846954]\n", " [ 0. 4.32111589 1.5275252 3.51381149 1.82723405]]\n", "Mean value 2.9143043986324475\n", "Mean line [0. 2.77325508 2.87678889 5.80386762 3.1176104 ]\n", "Mean column [5.40710095 1.50956581 2.50261352 2.23793733]\n" ] } ], "source": [ "# Ex1: Get the mean through different dimensions\n", "print(A)\n", "print('Mean value', A.mean())\n", "print('Mean line', A.mean(axis=0))\n", "print('Mean column', A.mean(axis=1))" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 0. 3.17258959 5.82409047 16.387435 1.65138967]\n", " [ 0. 1.48233207 2.86023812 1.32191048 1.88334836]\n", " [ 0. 2.11698277 1.29530177 1.99231351 7.10846954]\n", " [ 0. 4.32111589 1.5275252 3.51381149 1.82723405]] (4, 5)\n", "[ 0. 3.17258959 5.82409047 16.387435 1.65138967 0.\n", " 1.48233207 2.86023812 1.32191048 1.88334836 0. 2.11698277\n", " 1.29530177 1.99231351 7.10846954 0. 4.32111589 1.5275252\n", " 3.51381149 1.82723405] (20,)\n" ] } ], "source": [ "# Ex2: Convert a 2D array in 1D keeping all elements\n", "print(A, A.shape)\n", "A_flat = A.flatten()\n", "print(A_flat, A_flat.shape)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Remark: dot product" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]\n", "385.0\n" ] } ], "source": [ "b = np.linspace(0, 10, 11)\n", "c = b @ b\n", "# before 3.5:\n", "# c = b.dot(b)\n", "print(b)\n", "print(c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### For Matlab users\n", "\n", "| ` ` | Matlab | Numpy |\n", "| ------------- | ------ | ----- |\n", "| element wise | `.*` | `*` |\n", "| dot product | `*` | `@` |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`numpy` arrays can also be sorted, even when they are composed of complex data if the type of the columns are explicitly stated with `dtypes`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### NumPy and SciPy sub-packages:\n", "\n", "We already saw `numpy.random` to generate `numpy` arrays filled with random values. This submodule also provides functions related to distributions (Poisson, gaussian, etc.) and permutations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To perform linear algebra with dense matrices, we can use the submodule `numpy.linalg`. For instance, in order to compute the determinant of a random matrix, we use the method `det`" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0.47138506 0.41353868 0.09441948 0.225147 0.82335198]\n", " [0.04490952 0.14682972 0.31792846 0.22918746 0.73823443]\n", " [0.50485749 0.99705961 0.51896582 0.93318595 0.11375617]\n", " [0.37148317 0.0477689 0.29061475 0.41826056 0.47950005]\n", " [0.70324502 0.82838271 0.92172528 0.79532669 0.56698101]]\n" ] }, { "data": { "text/plain": [ "0.06968780805887545" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = np.random.random([5,5])\n", "print(A)\n", "np.linalg.det(A)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0.14682972 0.31792846]\n", " [0.99705961 0.51896582]]\n" ] }, { "data": { "text/plain": [ "array([[-2.15522717, 1.32033369],\n", " [ 4.14071576, -0.6097731 ]])" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "squared_subA = A[1:3, 1:3]\n", "print(squared_subA)\n", "np.linalg.inv(squared_subA)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Introduction to Pandas: Python Data Analysis Library\n", "\n", "Pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for Python.\n", "\n", "[Pandas tutorial](https://pandas.pydata.org/pandas-docs/stable/10min.html)\n", "[Grenoble Python Working Session](https://github.com/iutzeler/Pres_Pandas/)\n", "[Pandas for SQL Users](https://hackernoon.com/pandas-cheatsheet-for-sql-people-part-1-2976894acd0)" ] } ], "metadata": { "celltoolbar": "Diaporama", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }