Vous avez reçu un message "Your GitLab account has been locked ..." ? Pas d'inquiétude : lisez cet article https://docs.gricad-pages.univ-grenoble-alpes.fr/help/unlock/

Commit 81a47726 authored by Loic Huder's avatar Loic Huder
Browse files

Improved pres110 numpy

parent fa9a99c0
Pipeline #33282 passed with stage
in 1 minute and 1 second
......@@ -33,9 +33,7 @@
"- [scipy](http://www.scipy.org/): high-level numerical routines. Optimization, regression, interpolation, etc.\n",
"- [matplotlib](http://matplotlib.org/): 2D-3D visualization, “publication-ready” plots.\n",
"\n",
"With `IPython` and `Spyder`, Python plus these fundamental scientific packages constitutes a very good alternative to Matlab, that is technically very similar (using the libraries Blas and Lapack). Matlab has a JustInTime (JIT) compiler so that Matlab code is generally faster than Python. However, we will see that Numpy is already quite efficient for standard operations and other Python tools (for example `pypy`, `cython`, `numba`, `pythran`, `theano`...) can be used to optimize the code to reach the performance of optimized Matlab code.\n",
"\n",
"The advantage of Python over Matlab is its high polyvalency (and nicer syntax) and there are notably several other scientific Python packages (see our notebook `pres13_doc_applications.ipynb`):"
"With `IPython` and `Spyder`, Python plus these fundamental scientific packages constitutes a very good alternative to Matlab, that is technically very similar (using the libraries Blas and Lapack)."
]
},
{
......@@ -46,6 +44,9 @@
}
},
"source": [
"Matlab has a JustInTime (JIT) compiler so that Matlab code is generally faster than Python. However, we will see that Numpy is already quite efficient for standard operations and other Python tools (for example `pypy`, `cython`, `numba`, `pythran`, `theano`...) can be used to optimize the code to reach the performance of optimized Matlab code.\n",
"\n",
"The advantage of Python over Matlab is its high polyvalency (and nicer syntax) and there are notably several other scientific Python packages (see our notebook `pres13_doc_applications.ipynb`):\n",
"- [sympy](http://www.sympy.org) for symbolic computing,\n",
"- [pandas](http://pandas.pydata.org/), [statsmodels](http://www.statsmodels.org), [seaborn](http://seaborn.pydata.org/) for statistics,\n",
"- [h5py](http://www.h5py.org/), [h5netcdf](https://pypi.python.org/pypi/h5netcdf) for hdf5 and netcdf files,\n",
......@@ -85,7 +86,19 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"NumPy provides the type `np.ndarray`. Such array are multidimensionnal sequences of homogeneous elements. They can be created for example with the commands:"
"NumPy provides the type `np.ndarray`. Such arrays are multidimensionnal sequences of homogeneous elements (numbers) to represent vectors, matrices, tensors..."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Array creation\n",
"NumPy arrays can be created in several ways:"
]
},
{
......@@ -118,7 +131,7 @@
{
"data": {
"text/plain": [
"array([1.27880790e-316, 0.00000000e+000, 6.91986808e-310, 1.57378525e-316])"
"array([1.75274491e-316, 6.94225492e-310, 6.94224527e-310, 6.94225376e-310])"
]
},
"execution_count": 3,
......@@ -135,7 +148,10 @@
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": true
"scrolled": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
......@@ -151,7 +167,7 @@
}
],
"source": [
"# slower than np.empty but the values are all 0.\n",
"# Filled with zeros (slower than np.empty)\n",
"np.zeros([2, 6])"
]
},
......@@ -185,12 +201,23 @@
}
],
"source": [
"# multidimensional array\n",
"# Multidimensional array filled with ones \n",
"a = np.ones([2, 3, 4])\n",
"print(a.shape, a.size, a.dtype)\n",
"a"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Generate sequences"
]
},
{
"cell_type": "code",
"execution_count": 6,
......@@ -208,7 +235,7 @@
}
],
"source": [
"# like range but produce 1D numpy array\n",
"# Like range but produces a 1D numpy array\n",
"np.arange(4)"
]
},
......@@ -220,7 +247,8 @@
{
"data": {
"text/plain": [
"array([0., 1., 2., 3.])"
"array([2. , 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2,\n",
" 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9])"
]
},
"execution_count": 7,
......@@ -229,8 +257,8 @@
}
],
"source": [
"# np.arange can produce arrays of floats\n",
"np.arange(4.)"
"# Start and step can be changed\n",
"np.arange(2., 4., 0.1)"
]
},
{
......@@ -250,13 +278,17 @@
}
],
"source": [
"# another convenient function to generate 1D arrays\n",
"# Equally-spaced elements between start and end (included)\n",
"np.linspace(10, 20, 5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"A NumPy array can be easily converted to a Python list."
]
......@@ -279,27 +311,6 @@
],
"source": [
"a = np.linspace(10, 20 ,5)\n",
"list(a)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[10.0, 12.5, 15.0, 17.5, 20.0]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Or even better\n",
"a.tolist()"
]
},
......@@ -311,203 +322,10 @@
}
},
"source": [
"# NumPy efficiency\n",
"Beside some convenient functions for the manipulation of data in arrays of arbritrary dimensions, `numpy` can be much more efficient than pure Python."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"15.6 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
]
}
],
"source": [
"n = 1000\n",
"# we use the ipython magic command %timeit\n",
"%timeit list(range(n))"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"%%capture timeit_python\n",
"# to capture the result of the command timeit in the variable timeit_python\n",
"# Pure Python\n",
"%timeit list(range(n))"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"%%capture timeit_numpy\n",
"# numpy\n",
"%timeit np.arange(n)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"def compute_time_in_second(timeit_result):\n",
" string = timeit_result.stdout\n",
" print(string)\n",
" for line in string.split('\\n'):\n",
" words = line.split(' ')\n",
" if len(words) > 1:\n",
" time = float(words[0])\n",
" unit = words[1]\n",
" if unit == 'ms':\n",
" time *= 1e-3\n",
" elif unit == 'us':\n",
" time *= 1e-6\n",
" elif unit == 'ns':\n",
" time *= 1e-9\n",
" return time\n",
"\n",
"def compare_times(string, timeit_python, timeit_numpy):\n",
" time_python = compute_time_in_second(timeit_python)\n",
" time_numpy = compute_time_in_second(timeit_numpy)\n",
"\n",
" print(string + ': ratio times (Python / NumPy): ', \n",
" time_python/time_numpy)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"12.7 us +- 1.14 us per loop (mean +- std. dev. of 7 runs, 100000 loops each)\n",
"\n",
"1.31 us +- 112 ns per loop (mean +- std. dev. of 7 runs, 1000000 loops each)\n",
"\n",
"Creation of object: ratio times (Python / NumPy): 9.694656488549617\n"
]
}
],
"source": [
"compare_times('Creation of object', timeit_python, timeit_numpy)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"n = 200000\n",
"python_r_1 = range(n)\n",
"python_r_2 = range(n)\n",
"## Manipulating NumPy arrays\n",
"\n",
"numpy_a_1 = np.arange(n)\n",
"numpy_a_2 = np.arange(n)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"%%capture timeit_python\n",
"%%timeit\n",
"# Regular Python\n",
"[(x + y) for x, y in zip(python_r_1, python_r_2)]"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"16.6 ms +- 220 us per loop (mean +- std. dev. of 7 runs, 100 loops each)\n",
"\n"
]
}
],
"source": [
"print(timeit_python)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"%%capture timeit_numpy\n",
"%%timeit\n",
"#Numpy\n",
"numpy_a_1 + numpy_a_2"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"246 us +- 16.7 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)\n",
"\n"
]
}
],
"source": [
"print(timeit_numpy)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"16.6 ms +- 220 us per loop (mean +- std. dev. of 7 runs, 100 loops each)\n",
"\n",
"246 us +- 16.7 us per loop (mean +- std. dev. of 7 runs, 1000 loops each)\n",
"\n",
"Additions: ratio times (Python / NumPy): 67.47967479674797\n"
]
}
],
"source": [
"compare_times('Additions', timeit_python, timeit_numpy)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This shows that when you need to perform mathematical operations on a lot of homogeneous numbers, it is more efficient to use `numpy` arrays."
"### Access elements\n",
"Elements in a `numpy` array can be accessed using indexing and slicing in any dimension. It also offers the same functionalities available in Fortan or Matlab."
]
},
{
......@@ -518,35 +336,29 @@
}
},
"source": [
"# Manipulating NumPy arrays"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Access elements\n",
"Elements in a `numpy` array can be accessed using indexing and slicing in any dimension. It also offers the same functionalities available in Fortan or Matlab.\n",
"\n",
"### Indexes and slices\n",
"For example, we can create an array `A` and perform any kind of selection operations on it."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[0.89925962, 0.31519992, 0.17170063, 0.06102236, 0.6055506 ],\n",
" [0.43365108, 0.67461267, 0.34962124, 0.75648088, 0.53096922],\n",
" [0.65643503, 0.4723704 , 0.77202087, 0.50192904, 0.14067726],\n",
" [0.80709755, 0.2314217 , 0.65465368, 0.28459125, 0.54727527]])"
"array([[0.74931905, 0.4399789 , 0.96017188, 0.88886798, 0.28382067],\n",
" [0.4532329 , 0.99181478, 0.07017858, 0.4993961 , 0.1678844 ],\n",
" [0.59791893, 0.50793759, 0.77954852, 0.05390075, 0.984206 ],\n",
" [0.93149267, 0.02959492, 0.60720976, 0.92916837, 0.24923606]])"
]
},
"execution_count": 22,
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
......@@ -558,16 +370,16 @@
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.4336510750584107"
"0.45323290450951004"
]
},
"execution_count": 23,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
......@@ -579,17 +391,17 @@
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0.89925962, 0.31519992, 0.17170063, 0.06102236, 0.6055506 ],\n",
" [0.43365108, 0.67461267, 0.34962124, 0.75648088, 0.53096922]])"
"array([[0.74931905, 0.4399789 , 0.96017188, 0.88886798, 0.28382067],\n",
" [0.4532329 , 0.99181478, 0.07017858, 0.4993961 , 0.1678844 ]])"
]
},
"execution_count": 24,
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
......@@ -601,16 +413,16 @@
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0.6055506 , 0.53096922, 0.14067726, 0.54727527])"
"array([0.28382067, 0.1678844 , 0.984206 , 0.24923606])"
]
},
"execution_count": 25,
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
......@@ -622,17 +434,17 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0.89925962, 0.17170063, 0.6055506 ],\n",
" [0.43365108, 0.34962124, 0.53096922]])"
"array([[0.74931905, 0.96017188, 0.28382067],\n",
" [0.4532329 , 0.07017858, 0.1678844 ]])"
]
},
"execution_count": 26,
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
......@@ -655,19 +467,19 @@
},
{
"cell_type": "code",
"execution_count": 27,
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ True False False False True]\n",
" [False True False True True]\n",
" [ True False True True False]\n",
" [ True False True False True]]\n",
"[0.89925962 0.6055506 0.67461267 0.75648088 0.53096922 0.65643503\n",
" 0.77202087 0.50192904 0.80709755 0.65465368 0.54727527]\n"
"[[ True False True True False]\n",
" [False True False False False]\n",
" [ True True True False True]\n",
" [ True False True True False]]\n",
"[0.74931905 0.96017188 0.88886798 0.99181478 0.59791893 0.50793759\n",
" 0.77954852 0.984206 0.93149267 0.60720976 0.92916837]\n"
]
}
],
......@@ -686,29 +498,29 @@
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[0.89925962 0.31519992 0.17170063 0.06102236 0.6055506 ]\n",
" [0.43365108 0.67461267 0.34962124 0.75648088 0.53096922]\n",
" [0.65643503 0.4723704 0.77202087 0.50192904 0.14067726]\n",
" [0.80709755 0.2314217 0.65465368 0.28459125 0.54727527]]\n"
"[[0.74931905 0.4399789 0.96017188 0.88886798 0.28382067]\n",
" [0.4532329 0.99181478 0.07017858 0.4993961 0.1678844 ]\n",
" [0.59791893 0.50793759 0.77954852 0.05390075 0.984206 ]\n",
" [0.93149267 0.02959492 0.60720976 0.92916837 0.24923606]]\n"
]
},
{
"data": {
"text/plain": [
"array([[0.89925962, 0.31519992, 0.6055506 ],\n",
" [0.43365108, 0.67461267, 0.53096922],\n",
" [0.65643503, 0.4723704 , 0.14067726],\n",
" [0.80709755, 0.2314217 , 0.54727527]])"
"array([[0.74931905, 0.4399789 , 0.28382067],\n",
" [0.4532329 , 0.99181478, 0.1678844 ],\n",
" [0.59791893, 0.50793759, 0.984206 ],\n",
" [0.93149267, 0.02959492, 0.24923606]])"
]
},
"execution_count": 28,
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
......@@ -733,19 +545,19 @@
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[34.80126403, 28.25135024, 26.7464874 , 25.61394735, 31.42219749],\n",
" [29.52456401, 32.20122896, 28.61844741, 33.13707212, 30.59162046],\n",
" [31.99525724, 29.94683782, 33.31622493, 30.27122313, 26.42656267],\n",
" [33.72238198, 27.36777304, 31.97510827, 27.92690466, 30.77226288]])"
"array([[33.05466955, 29.59337041, 35.52364888, 34.67876614, 27.91876089],\n",
" [29.73774911, 35.90184432, 25.7067108 , 30.24335742, 26.70702917],\n",
" [31.3366963 , 30.33737648, 33.4031811 , 25.54191284, 35.81072142],\n",
" [35.18260526, 25.29682501, 31.44080124, 35.15503757, 27.55447916]])"
]
},
"execution_count": 29,
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
......@@ -763,19 +575,19 @@
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[2.45778274, 1.37053329, 1.18732233, 1.06292268, 1.83226077],\n",
" [1.54288042, 1.9632724 , 1.41853016, 2.13076459, 1.70057974],\n",
" [1.92790714, 1.60379132, 2.16413527, 1.65190478, 1.15105309],\n",
" [2.24139301, 1.26039064, 1.92447592, 1.3292186 , 1.72853679]])"
"array([[2.11555894, 1.55267445, 2.61214542, 2.43237461, 1.32819473],\n",
" [1.57339059, 2.6961229 , 1.07269972, 1.6477259 , 1.18279987],\n",
" [1.81833078, 1.66186022, 2.1804876 , 1.05537986, 2.67568654],\n",
" [2.53829518, 1.0300372 , 1.8353033 , 2.53240228, 1.28304487]])"
]
},
"execution_count": 30,
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
......@@ -786,24 +598,211 @@
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## NumPy efficiency\n",
"\n",
"In addition of being extremely convenient to manipulate arrays of arbitrary dimensions, `numpy` is also much more efficient than pure Python.\n",
"\n",
"### Array creation:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},