Vous avez reçu un message "Your GitLab account has been locked ..." ? Pas d'inquiétude : lisez cet article https://docs.gricad-pages.univ-grenoble-alpes.fr/help/unlock/

Commit 81a47726 authored by Loic Huder's avatar Loic Huder
Browse files

Improved pres110 numpy

parent fa9a99c0
Pipeline #33282 passed with stage
in 1 minute and 1 second
......@@ -33,9 +33,7 @@
"- [scipy](http://www.scipy.org/): high-level numerical routines. Optimization, regression, interpolation, etc.\n",
"- [matplotlib](http://matplotlib.org/): 2D-3D visualization, “publication-ready” plots.\n",
"\n",
"With `IPython` and `Spyder`, Python plus these fundamental scientific packages constitutes a very good alternative to Matlab, that is technically very similar (using the libraries Blas and Lapack). Matlab has a JustInTime (JIT) compiler so that Matlab code is generally faster than Python. However, we will see that Numpy is already quite efficient for standard operations and other Python tools (for example `pypy`, `cython`, `numba`, `pythran`, `theano`...) can be used to optimize the code to reach the performance of optimized Matlab code.\n",
"\n",
"The advantage of Python over Matlab is its high polyvalency (and nicer syntax) and there are notably several other scientific Python packages (see our notebook `pres13_doc_applications.ipynb`):"
"With `IPython` and `Spyder`, Python plus these fundamental scientific packages constitutes a very good alternative to Matlab, that is technically very similar (using the libraries Blas and Lapack)."
]
},
{
......@@ -46,6 +44,9 @@
}
},
"source": [
"Matlab has a JustInTime (JIT) compiler so that Matlab code is generally faster than Python. However, we will see that Numpy is already quite efficient for standard operations and other Python tools (for example `pypy`, `cython`, `numba`, `pythran`, `theano`...) can be used to optimize the code to reach the performance of optimized Matlab code.\n",
"\n",
"The advantage of Python over Matlab is its high polyvalency (and nicer syntax) and there are notably several other scientific Python packages (see our notebook `pres13_doc_applications.ipynb`):\n",
"- [sympy](http://www.sympy.org) for symbolic computing,\n",
"- [pandas](http://pandas.pydata.org/), [statsmodels](http://www.statsmodels.org), [seaborn](http://seaborn.pydata.org/) for statistics,\n",
"- [h5py](http://www.h5py.org/), [h5netcdf](https://pypi.python.org/pypi/h5netcdf) for hdf5 and netcdf files,\n",
......@@ -85,7 +86,19 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"NumPy provides the type `np.ndarray`. Such array are multidimensionnal sequences of homogeneous elements. They can be created for example with the commands:"
"NumPy provides the type `np.ndarray`. Such arrays are multidimensionnal sequences of homogeneous elements (numbers) to represent vectors, matrices, tensors..."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Array creation\n",
"NumPy arrays can be created in several ways:"
]
},
{
......@@ -118,7 +131,7 @@
{
"data": {
"text/plain": [
"array([1.27880790e-316, 0.00000000e+000, 6.91986808e-310, 1.57378525e-316])"
"array([1.75274491e-316, 6.94225492e-310, 6.94224527e-310, 6.94225376e-310])"
]
},
"execution_count": 3,
......@@ -135,7 +148,10 @@
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": true
"scrolled": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
......@@ -151,7 +167,7 @@
}
],
"source": [
"# slower than np.empty but the values are all 0.\n",
"# Filled with zeros (slower than np.empty)\n",
"np.zeros([2, 6])"
]
},
......@@ -185,12 +201,23 @@
}
],
"source": [
"# multidimensional array\n",
"# Multidimensional array filled with ones \n",
"a = np.ones([2, 3, 4])\n",
"print(a.shape, a.size, a.dtype)\n",
"a"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Generate sequences"
]
},
{
"cell_type": "code",
"execution_count": 6,
......@@ -208,7 +235,7 @@
}
],
"source": [
"# like range but produce 1D numpy array\n",
"# Like range but produces a 1D numpy array\n",
"np.arange(4)"
]
},
......@@ -220,7 +247,8 @@
{
"data": {
"text/plain": [
"array([0., 1., 2., 3.])"
"array([2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2,\n",
" 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9])"
]
},
"execution_count": 7,
......@@ -229,8 +257,8 @@
}
],
"source": [
"# np.arange can produce arrays of floats\n",
"np.arange(4.)"
"# Start and step can be changed\n",
"np.arange(2., 4., 0.1)"
]
},
{
......@@ -250,13 +278,17 @@
}
],
"source": [
"# another convenient function to generate 1D arrays\n",
"# Equally-spaced elements between start and end (included)\n",
"np.linspace(10, 20, 5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"A NumPy array can be easily converted to a Python list."
]
......@@ -279,28 +311,21 @@
],
"source": [
"a = np.linspace(10, 20 ,5)\n",
"list(a)"
"a.tolist()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[10.0, 12.5, 15.0, 17.5, 20.0]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
],
},
"source": [
"# Or even better\n",
"a.tolist()"
"## Manipulating NumPy arrays\n",
"\n",
"### Access elements\n",
"Elements in a `numpy` array can be accessed using indexing and slicing in any dimension. It also offers the same functionalities available in Fortan or Matlab."
]
},
{
......@@ -311,203 +336,199 @@
}
},
"source": [
"# NumPy efficiency\n",
"Beside some convenient functions for the manipulation of data in arrays of arbritrary dimensions, `numpy` can be much more efficient than pure Python."
"### Indexes and slices\n",
"For example, we can create an array `A` and perform any kind of selection operations on it."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"15.6 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
"data": {
"text/plain": [
"array([[0.74931905, 0.4399789 , 0.96017188, 0.88886798, 0.28382067],\n",
" [0.4532329 , 0.99181478, 0.07017858, 0.4993961 , 0.1678844 ],\n",
" [0.59791893, 0.50793759, 0.77954852, 0.05390075, 0.984206 ],\n",
" [0.93149267, 0.02959492, 0.60720976, 0.92916837, 0.24923606]])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"n = 1000\n",
"# we use the ipython magic command %timeit\n",
"%timeit list(range(n))"
"A = np.random.random([4, 5])\n",
"A"
]
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"%%capture timeit_python\n",
"# to capture the result of the command timeit in the variable timeit_python\n",
"# Pure Python\n",
"%timeit list(range(n))"
]
},
"outputs": [
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"%%capture timeit_numpy\n",
"# numpy\n",
"%timeit np.arange(n)"
"data": {
"text/plain": [
"0.45323290450951004"
]
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 11,
"metadata": {},
"outputs": [],
"output_type": "execute_result"
}
],
"source": [
"def compute_time_in_second(timeit_result):\n",
" string = timeit_result.stdout\n",
" print(string)\n",
" for line in string.split('\\n'):\n",
" words = line.split(' ')\n",
" if len(words) > 1:\n",
" time = float(words[0])\n",
" unit = words[1]\n",
" if unit == 'ms':\n",
" time *= 1e-3\n",
" elif unit == 'us':\n",
" time *= 1e-6\n",
" elif unit == 'ns':\n",
" time *= 1e-9\n",
" return time\n",
"\n",
"def compare_times(string, timeit_python, timeit_numpy):\n",
" time_python = compute_time_in_second(timeit_python)\n",
" time_numpy = compute_time_in_second(timeit_numpy)\n",
"\n",
" print(string + ': ratio times (Python / NumPy): ', \n",
" time_python/time_numpy)"
"# Get the element from second line, first column\n",
"A[1, 0]"
]
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"12.7 us +- 1.14 us per loop (mean +- std. dev. of 7 runs, 100000 loops each)\n",
"\n",
"1.31 us +- 112 ns per loop (mean +- std. dev. of 7 runs, 1000000 loops each)\n",
"\n",
"Creation of object: ratio times (Python / NumPy): 9.694656488549617\n"
"data": {
"text/plain": [
"array([[0.74931905, 0.4399789 , 0.96017188, 0.88886798, 0.28382067],\n",
" [0.4532329 , 0.99181478, 0.07017858, 0.4993961 , 0.1678844 ]])"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"compare_times('Creation of object', timeit_python, timeit_numpy)"
"# Get the first two lines\n",
"A[:2]"
]
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"n = 200000\n",
"python_r_1 = range(n)\n",
"python_r_2 = range(n)\n",
"\n",
"numpy_a_1 = np.arange(n)\n",
"numpy_a_2 = np.arange(n)"
"outputs": [
{
"data": {
"text/plain": [
"array([0.28382067, 0.1678844 , 0.984206 , 0.24923606])"
]
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 13,
"metadata": {},
"outputs": [],
"output_type": "execute_result"
}
],
"source": [
"%%capture timeit_python\n",
"%%timeit\n",
"# Regular Python\n",
"[(x + y) for x, y in zip(python_r_1, python_r_2)]"
"# Get the last column\n",
"A[:, -1]"
]
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"16.6 ms +- 220 us per loop (mean +- std. dev. of 7 runs, 100 loops each)\n",
"\n"
"data": {
"text/plain": [
"array([[0.74931905, 0.96017188, 0.28382067],\n",
" [0.4532329 , 0.07017858, 0.1678844 ]])"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(timeit_python)"
"# Get the first two lines and the columns with an even index\n",
"A[:2, ::2]"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"%%capture timeit_numpy\n",
"%%timeit\n",
"#Numpy\n",
"numpy_a_1 + numpy_a_2"
"### Using a mask to select elements validating a condition:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"246 us +- 16.7 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)\n",
"\n"
"[[ True False True True False]\n",
" [False True False False False]\n",
" [ True True True False True]\n",
" [ True False True True False]]\n",
"[0.74931905 0.96017188 0.88886798 0.99181478 0.59791893 0.50793759\n",
" 0.77954852 0.984206 0.93149267 0.60720976 0.92916837]\n"
]
}
],
"source": [
"print(timeit_numpy)"
"cond = A > 0.5\n",
"print(cond)\n",
"print(A[cond])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The mask is in fact a particular case of the advanced indexing capabilities provided by NumPy. For example, it is even possible to use lists for indexing:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"16.6 ms +- 220 us per loop (mean +- std. dev. of 7 runs, 100 loops each)\n",
"\n",
"246 us +- 16.7 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)\n",
"\n",
"Additions: ratio times (Python / NumPy): 67.47967479674797\n"
]
}
],
"source": [
"compare_times('Additions', timeit_python, timeit_numpy)"
"[[0.74931905 0.4399789 0.96017188 0.88886798 0.28382067]\n",
" [0.4532329 0.99181478 0.07017858 0.4993961 0.1678844 ]\n",
" [0.59791893 0.50793759 0.77954852 0.05390075 0.984206 ]\n",
" [0.93149267 0.02959492 0.60720976 0.92916837 0.24923606]]\n"
]
},
{
"cell_type": "markdown",
"data": {
"text/plain": [
"array([[0.74931905, 0.4399789 , 0.28382067],\n",
" [0.4532329 , 0.99181478, 0.1678844 ],\n",
" [0.59791893, 0.50793759, 0.984206 ],\n",
" [0.93149267, 0.02959492, 0.24923606]])"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"This shows that when you need to perform mathematical operations on a lot of homogeneous numbers, it is more efficient to use `numpy` arrays."
"# Selecting only particular columns\n",
"print(A)\n",
"A[:, [0, 1, 4]]"
]
},
{
......@@ -518,205 +539,167 @@
}
},
"source": [
"# Manipulating NumPy arrays"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Access elements\n",
"Elements in a `numpy` array can be accessed using indexing and slicing in any dimension. It also offers the same functionalities available in Fortan or Matlab.\n",
"\n",
"### Indexes and slices\n",
"For example, we can create an array `A` and perform any kind of selection operations on it."
"## Perform array manipulations\n",
"### Apply arithmetic operations to whole arrays (element-wise):"
]
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0.89925962, 0.31519992, 0.17170063, 0.06102236, 0.6055506 ],\n",
" [0.43365108, 0.67461267, 0.34962124, 0.75648088, 0.53096922],\n",
" [0.65643503, 0.4723704 , 0.77202087, 0.50192904, 0.14067726],\n",
" [0.80709755, 0.2314217 , 0.65465368, 0.28459125, 0.54727527]])"
"array([[33.05466955, 29.59337041, 35.52364888, 34.67876614, 27.91876089],\n",
" [29.73774911, 35.90184432, 25.7067108 , 30.24335742, 26.70702917],\n",
" [31.3366963 , 30.33737648, 33.4031811 , 25.54191284, 35.81072142],\n",
" [35.18260526, 25.29682501, 31.44080124, 35.15503757, 27.55447916]])"
]
},
"execution_count": 22,
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A = np.random.random([4, 5])\n",
"A"
"(A+5)**2"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.4336510750584107"
]
},
"execution_count": 23,
"cell_type": "markdown",
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get the element from second line, first column\n",
"A[1, 0]"
"### Apply functions element-wise:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0.89925962, 0.31519992, 0.17170063, 0.06102236, 0.6055506 ],\n",
" [0.43365108, 0.67461267, 0.34962124, 0.75648088, 0.53096922]])"
"array([[2.11555894, 1.55267445, 2.61214542, 2.43237461, 1.32819473],\n",
" [1.57339059, 2.6961229 , 1.07269972, 1.6477259 , 1.18279987],\n",
" [1.81833078, 1.66186022, 2.1804876 , 1.05537986, 2.67568654],\n",
" [2.53829518, 1.0300372 , 1.8353033 , 2.53240228, 1.28304487]])"
]
},
"execution_count": 24,
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get the first two lines\n",
"A[:2]"
"np.exp(A) # With numpy arrays, use the functions from numpy !"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0.6055506 , 0.53096922, 0.14067726, 0.54727527])"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
],
},
"source": [
"# Get the last column\n",
"A[:, -1]"
"## NumPy efficiency\n",
"\n",
"In addition of being extremely convenient to manipulate arrays of arbitrary dimensions, `numpy` is also much more efficient than pure Python.\n",
"\n",
"### Array creation:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0.89925962, 0.17170063, 0.6055506 ],\n",
" [0.43365108, 0.34962124, 0.53096922]])"
]
},
"execution_count": 26,
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],