Commit 21ce2e36 authored by Florent Chatelain's avatar Florent Chatelain
Browse files

fix deprecated

parent fa3ce8e0
......@@ -11,10 +11,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to Scipy and Numpy\n",
"Data analysis needs effective computational ressources to read/write and process data. Usually, the data set to be processed is a set of arrays. [Scipy](https://www.scipy.org/) (*Scientific Python*) package is a dedicated tool to operate on array efficiently. Quoting the *FAQ*, Scipy is \"*set of open source (BSD licensed) scientific and numerical tools for Python. It currently supports special functions, integration, ordinary differential equation (ODE) solvers, gradient optimization, parallel programming tools, an expression-to-C++ compiler for fast execution, and others. A good rule of thumb is that if it’s covered in a general textbook on numerical computing (for example, the well-known Numerical Recipes series), it’s probably implemented in scipy*\". This is the core of any data analysis package in Python.\n",
"# Introduction to Numpy and Scipy\n",
"\n",
"The main structure provided by Numpy is the *Fixed-Type Arrays*: **ndarray**. It is an efficient way of storing data and processing them."
"Data analysis needs effective computational ressources to read/write and process data. Usually, the data set to be processed is a set of arrays. \n",
"\n",
"The main structure provided by [Numpy](https://numpy.org/) is the *Fixed-Type Arrays*: **ndarray**. It is an efficient way of storing data and processing them.\n",
"\n",
"\n",
"[Scipy](https://www.scipy.org/) (*Scientific Python*) package is a dedicated tool that elaborates on Numpy to operate on *ndarray* efficiently. Quoting the *FAQ*, Scipy is \"*set of open source (BSD licensed) scientific and numerical tools for Python. It currently supports special functions, integration, ordinary differential equation (ODE) solvers, gradient optimization, parallel programming tools, an expression-to-C++ compiler for fast execution, and others. A good rule of thumb is that if it’s covered in a general textbook on numerical computing (for example, the well-known Numerical Recipes series), it’s probably implemented in scipy*\". This is the core of any data analysis package in Python."
]
},
{
......@@ -174,7 +178,7 @@
"source": [
"## Basics of Arrays \n",
"\n",
"There are plenty of functions to create and to initialize specific array (np.zeros, np.ones, np.empty ...). For each case, it is possible to define the type (int8, uint8, float64 ...) by providing the corresponding parameter. More information regarding the different array types can be found here: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html and https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.dtypes.html.\n",
"There are plenty of functions to create and to initialize specific array (np.zeros, np.ones, np.empty ...). For each case, it is possible to define the type (int8, uint8, float64 ...) by providing the corresponding parameter. More information regarding the different array types can be found here: https://numpy.org/doc/stable/user/basics.types.html and https://numpy.org/doc/stable/reference/arrays.dtypes.html.\n",
"\n",
"### Getting attributes\n"
]
......@@ -338,7 +342,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Most all conventional functions exist: arithmetic, trigonometric, log/exp ... A detailed list is available here: https://docs.scipy.org/doc/numpy-1.13.0/reference/ufuncs.html"
"Most all conventional functions exist: arithmetic, trigonometric, log/exp ... A detailed list is available here: https://numpy.org/doc/stable/reference/ufuncs.html"
]
},
{
......@@ -346,7 +350,7 @@
"metadata": {},
"source": [
"### Reduction\n",
"Scipy provides a set of functions to extrac values from the array itself and for some specific dimension of the array"
"Numpy provides a set of functions to extrac values from the array itself and for some specific dimension of the array"
]
},
{
......
%% Cell type:markdown id: tags:
This notebook can be run on mybinder: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Fgricad-gitlab.univ-grenoble-alpes.fr%2Fchatelaf%2Fparcours-numerique-ia/master?filepath=notebooks%2F0_python_in_a_nutshell/N0b_introduction_scipy.ipynb)
%% Cell type:markdown id: tags:
# Introduction to Scipy and Numpy
Data analysis needs effective computational ressources to read/write and process data. Usually, the data set to be processed is a set of arrays. [Scipy](https://www.scipy.org/) (*Scientific Python*) package is a dedicated tool to operate on array efficiently. Quoting the *FAQ*, Scipy is "*set of open source (BSD licensed) scientific and numerical tools for Python. It currently supports special functions, integration, ordinary differential equation (ODE) solvers, gradient optimization, parallel programming tools, an expression-to-C++ compiler for fast execution, and others. A good rule of thumb is that if it’s covered in a general textbook on numerical computing (for example, the well-known Numerical Recipes series), it’s probably implemented in scipy*". This is the core of any data analysis package in Python.
# Introduction to Numpy and Scipy
The main structure provided by Numpy is the *Fixed-Type Arrays*: **ndarray**. It is an efficient way of storing data and processing them.
Data analysis needs effective computational ressources to read/write and process data. Usually, the data set to be processed is a set of arrays.
The main structure provided by [Numpy](https://numpy.org/) is the *Fixed-Type Arrays*: **ndarray**. It is an efficient way of storing data and processing them.
[Scipy](https://www.scipy.org/) (*Scientific Python*) package is a dedicated tool that elaborates on Numpy to operate on *ndarray* efficiently. Quoting the *FAQ*, Scipy is "*set of open source (BSD licensed) scientific and numerical tools for Python. It currently supports special functions, integration, ordinary differential equation (ODE) solvers, gradient optimization, parallel programming tools, an expression-to-C++ compiler for fast execution, and others. A good rule of thumb is that if it’s covered in a general textbook on numerical computing (for example, the well-known Numerical Recipes series), it’s probably implemented in scipy*". This is the core of any data analysis package in Python.
%% Cell type:code id: tags:
``` python
# Find the scipy and numpy module and define an alias in the local namespace
import scipy as sp
import numpy as np
```
%% Cell type:code id: tags:
``` python
A = np.array(range(10)) # Create array from a list
print("A = {}".format(A)) # note that there is 10 elements: 0,1,...,9
B = np.arange(10) # Create array from scratch
print("B = {}".format(B))
np.array?
```
%%%% Output: stream
A = [0 1 2 3 4 5 6 7 8 9]
B = [0 1 2 3 4 5 6 7 8 9]
%%%% Output: display_data
%% Cell type:markdown id: tags:
## Basics of Arrays
There are plenty of functions to create and to initialize specific array (np.zeros, np.ones, np.empty ...). For each case, it is possible to define the type (int8, uint8, float64 ...) by providing the corresponding parameter. More information regarding the different array types can be found here: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html and https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.dtypes.html.
There are plenty of functions to create and to initialize specific array (np.zeros, np.ones, np.empty ...). For each case, it is possible to define the type (int8, uint8, float64 ...) by providing the corresponding parameter. More information regarding the different array types can be found here: https://numpy.org/doc/stable/user/basics.types.html and https://numpy.org/doc/stable/reference/arrays.dtypes.html.
### Getting attributes
%% Cell type:code id: tags:
``` python
# Attributes
print("Number of elements in A: {}".format(A.size))
print("Number of dimension of A: {}".format(A.ndim))
print("Dimension of A: {}".format(A.shape))
print("Type of element in A: {}".format(A.dtype))
```
%%%% Output: stream
Number of elements in A: 10
Number of dimension of A: 1
Dimension of A: (10,)
Type of element in A: int64
%% Cell type:markdown id: tags:
It is possible to modify explicitely some attributes, in particlar the *shape*:
%% Cell type:code id: tags:
``` python
B.shape = (2,5) # Change the shape to two lines, 5 columns -> the number of total elements should be the same
print("B = \n {}".format(B))
C = B.reshape(10) # The function return a new array with the corresponding shape
print(B.shape)
print(C.shape)
```
%%%% Output: stream
B =
[[0 1 2 3 4]
[5 6 7 8 9]]
(2, 5)
(10,)
%% Cell type:markdown id: tags:
### Accessing elements
%% Cell type:code id: tags:
``` python
print("A = {}".format(A))
print(A[0]) # First element
print(A[1]) # Second element
print(A[-1]) # Last element
print(A[-2]) # Antepenultimate element
```
%%%% Output: stream
A = [0 1 2 3 4 5 6 7 8 9]
0
1
9
8
%% Cell type:code id: tags:
``` python
# Some slicing
print(A[0:3]) # Return an array of elements of A from the first (index 0) to the third (index 2)
print(A[::2]) # All elements with a step of 2
print(A[-3:-1]) # Can use reverse order
```
%%%% Output: stream
[0 1 2]
[0 2 4 6 8]
[7 8]
%% Cell type:markdown id: tags:
## Computation on Array
### Universal functions
A general comment for interpreted laguage: **do not use loop if you can** ! It is slow and inefficient.
The comment apply here with Python. Scipy provide a large types of operation that are optimized to work on array directly (as in Matlab, R ...). In particular, *universal functions* (ufuncs) are a set of functions for fast element-wise operations (+, -, power ...). Let see a short example:
%% Cell type:code id: tags:
``` python
def my_add(M,N): # Suppose that A and B have the same shape
P = np.empty_like(M)
nl, nr = M.shape
for i in range(nl):
for j in range(nr):
P[i,j] = M[i,j] + N[i,j]
return P
M, N = np.arange(100000).reshape(1000,100), np.arange(100000).reshape(1000,100)
# Evaluate execution time by repeating several runs based on a total of 2 seconds execution window
print('using loops')
%timeit my_add(M,N) # using loops
print('using ufuncs')
%timeit M + N # using ufuncs equivalent to sp.add(A,B)
```
%%%% Output: stream
using loops
82 ms ± 2.24 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
using ufuncs
149 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%% Cell type:markdown id: tags:
Most all conventional functions exist: arithmetic, trigonometric, log/exp ... A detailed list is available here: https://docs.scipy.org/doc/numpy-1.13.0/reference/ufuncs.html
Most all conventional functions exist: arithmetic, trigonometric, log/exp ... A detailed list is available here: https://numpy.org/doc/stable/reference/ufuncs.html
%% Cell type:markdown id: tags:
### Reduction
Scipy provides a set of functions to extrac values from the array itself and for some specific dimension of the array
Numpy provides a set of functions to extrac values from the array itself and for some specific dimension of the array
%% Cell type:code id: tags:
``` python
A = sp.random.rand(5,4)
print("A = \n{}".format(A))
print(A.sum()) # Sum over all element
print(A.sum(axis=0)) # Sum over the lines: return an array of values
print(A.sum(axis=1)) # over the columns
```
%%%% Output: stream
A =
[[0.04729897 0.24017566 0.5864012 0.36822915]
[0.64367639 0.79125865 0.01782902 0.19915723]
[0.07646624 0.74159009 0.27335153 0.24020963]
[0.35719842 0.61549434 0.96633268 0.68797117]
[0.75598905 0.77374067 0.5180737 0.24297386]]
9.143417656235526
[1.88062907 3.16225941 2.36198813 1.73854104]
[1.24210498 1.6519213 1.33161748 2.62699661 2.29077728]
%% Cell type:markdown id: tags:
Using the same convention, it is possible to get the cumulative sum (cumsum), product of element (prod, cumprod), the maximum/minimum value (max, min) and their position (argmax, argmin) and the first and second statistical moment (mean, var/std). It is also possible to check if a condition is fullfilled for all or any elements of the array.
%% Cell type:code id: tags:
``` python
np.any(A>0)
```
%%%% Output: execute_result
True
%% Cell type:code id: tags:
``` python
np.all(A>0.5)
```
%%%% Output: execute_result
False
%% Cell type:markdown id: tags:
### Some exercices
- Find the maximum and minimum value of A
- Find the maximum of each line
- Find the mean value of each row
- Find the position of the minimum value of each row
%% Cell type:code id: tags:
``` python
print(A.max())
print(A.min())
print(A.max(axis=0))
print(A.mean(axis=1))
print(A.argmin(axis=1))
```
%%%% Output: stream
0.9663326779302741
0.01782902358944749
[0.75598905 0.79125865 0.96633268 0.68797117]
[0.31052625 0.41298032 0.33290437 0.65674915 0.57269432]
[0 2 0 0 3]
%% Cell type:markdown id: tags:
### Broadcasting
Broadcasting allow to define efficient operations between arrays of different sizes, given some of them are compatible. An extreme example is adding a scalar to a matrix
%% Cell type:code id: tags:
``` python
A+3
```
%%%% Output: execute_result
array([[3.04729897, 3.24017566, 3.5864012 , 3.36822915],
[3.64367639, 3.79125865, 3.01782902, 3.19915723],
[3.07646624, 3.74159009, 3.27335153, 3.24020963],
[3.35719842, 3.61549434, 3.96633268, 3.68797117],
[3.75598905, 3.77374067, 3.5180737 , 3.24297386]])
%% Cell type:markdown id: tags:
Easy ? Now if I need to center the data, it is also super easy
%% Cell type:code id: tags:
``` python
print('Size of A: {}'.format(A.shape))
print('Size of the average of A along the lines: {}'.format(A.mean(axis=0).shape))
# Suppose that each line is a sample, and each column a measurement (i.e., a variable)
Ac= A - A.mean(axis=0)
print('Size of centered A: {}'.format(Ac.shape))
print ('Ac=\n{}'.format(Ac))
```
%%%% Output: stream
Size of A: (5, 4)
Size of the average of A along the lines: (4,)
Size of centered A: (5, 4)
Ac=
[[-0.32882685 -0.39227622 0.11400358 0.02052094]
[ 0.26755058 0.15880677 -0.4545686 -0.14855098]
[-0.29965958 0.10913821 -0.1990461 -0.10749858]
[-0.01892739 -0.01695754 0.49393505 0.34026296]
[ 0.37986324 0.14128879 0.04567607 -0.10473435]]
%% Cell type:markdown id: tags:
If we need to standardize the data (substract the mean and divide by the standard deviation), it can be achieved easily:
%% Cell type:code id: tags:
``` python
As = (A-A.mean(axis=0))/A.std(axis=0)
print(As)
```
%%%% Output: stream
[[-1.14253071 -1.90838879 0.35861286 0.11443247]
[ 0.92962225 0.77258074 -1.42990378 -0.82837592]
[-1.0411871 0.53094764 -0.62612501 -0.59945235]
[-0.06576447 -0.08249693 1.55373599 1.8974338 ]
[ 1.31986004 0.68735733 0.14367994 -0.58403801]]
%% Cell type:markdown id: tags:
More details about broadcasting can be found here: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html
%% Cell type:markdown id: tags:
## Ploting in Python
The package [Matplotlib](https://matplotlib.org/) offers several functions to plot data. Below an example using 2D data, more complicated plots can be constructed when needed.
%% Cell type:code id: tags:
``` python
%matplotlib inline
import matplotlib.pyplot as plt
x = np.arange(0,10,0.01)
y = x**2
plt.plot(x,y)
plt.grid()
plt.show()
```
%%%% Output: display_data
[Hidden Image Output]
%% Cell type:code id: tags:
``` python
```
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment