"Data analysis needs effective computational ressources to read/write and process data. Usually, the data set to be processed is a set of arrays. [Scipy](https://www.scipy.org/) (*Scientific Python*) package is a dedicated tool to operate on array efficiently. Quoting the *FAQ*, Scipy is \"*set of open source (BSD licensed) scientific and numerical tools for Python. It currently supports special functions, integration, ordinary differential equation (ODE) solvers, gradient optimization, parallel programming tools, an expression-to-C++ compiler for fast execution, and others. A good rule of thumb is that if it’s covered in a general textbook on numerical computing (for example, the well-known Numerical Recipes series), it’s probably implemented in scipy*\". This is the core of any data analysis package in Python.\n",

"# Introduction to Numpy and Scipy\n",

"\n",

"The main structure provided by Numpy is the *Fixed-Type Arrays*: **ndarray**. It is an efficient way of storing data and processing them."

"Data analysis needs effective computational ressources to read/write and process data. Usually, the data set to be processed is a set of arrays. \n",

"\n",

"The main structure provided by [Numpy](https://numpy.org/) is the *Fixed-Type Arrays*: **ndarray**. It is an efficient way of storing data and processing them.\n",

"\n",

"\n",

"[Scipy](https://www.scipy.org/) (*Scientific Python*) package is a dedicated tool that elaborates on Numpy to operate on *ndarray* efficiently. Quoting the *FAQ*, Scipy is \"*set of open source (BSD licensed) scientific and numerical tools for Python. It currently supports special functions, integration, ordinary differential equation (ODE) solvers, gradient optimization, parallel programming tools, an expression-to-C++ compiler for fast execution, and others. A good rule of thumb is that if it’s covered in a general textbook on numerical computing (for example, the well-known Numerical Recipes series), it’s probably implemented in scipy*\". This is the core of any data analysis package in Python."

]

},

{

...

...

@@ -174,7 +178,7 @@

"source": [

"## Basics of Arrays \n",

"\n",

"There are plenty of functions to create and to initialize specific array (np.zeros, np.ones, np.empty ...). For each case, it is possible to define the type (int8, uint8, float64 ...) by providing the corresponding parameter. More information regarding the different array types can be found here: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html and https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.dtypes.html.\n",

"There are plenty of functions to create and to initialize specific array (np.zeros, np.ones, np.empty ...). For each case, it is possible to define the type (int8, uint8, float64 ...) by providing the corresponding parameter. More information regarding the different array types can be found here: https://numpy.org/doc/stable/user/basics.types.html and https://numpy.org/doc/stable/reference/arrays.dtypes.html.\n",

"\n",

"### Getting attributes\n"

]

...

...

@@ -338,7 +342,7 @@

"cell_type": "markdown",

"metadata": {},

"source": [

"Most all conventional functions exist: arithmetic, trigonometric, log/exp ... A detailed list is available here: https://docs.scipy.org/doc/numpy-1.13.0/reference/ufuncs.html"

"Most all conventional functions exist: arithmetic, trigonometric, log/exp ... A detailed list is available here: https://numpy.org/doc/stable/reference/ufuncs.html"

]

},

{

...

...

@@ -346,7 +350,7 @@

"metadata": {},

"source": [

"### Reduction\n",

"Scipy provides a set of functions to extrac values from the array itself and for some specific dimension of the array"

"Numpy provides a set of functions to extrac values from the array itself and for some specific dimension of the array"

]

},

{

...

...

%% Cell type:markdown id: tags:

This notebook can be run on mybinder: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Fgricad-gitlab.univ-grenoble-alpes.fr%2Fchatelaf%2Fparcours-numerique-ia/master?filepath=notebooks%2F0_python_in_a_nutshell/N0b_introduction_scipy.ipynb)

%% Cell type:markdown id: tags:

# Introduction to Scipy and Numpy

Data analysis needs effective computational ressources to read/write and process data. Usually, the data set to be processed is a set of arrays. [Scipy](https://www.scipy.org/)(*Scientific Python*) package is a dedicated tool to operate on array efficiently. Quoting the *FAQ*, Scipy is "*set of open source (BSD licensed) scientific and numerical tools for Python. It currently supports special functions, integration, ordinary differential equation (ODE) solvers, gradient optimization, parallel programming tools, an expression-to-C++ compiler for fast execution, and others. A good rule of thumb is that if it’s covered in a general textbook on numerical computing (for example, the well-known Numerical Recipes series), it’s probably implemented in scipy*". This is the core of any data analysis package in Python.

# Introduction to Numpy and Scipy

The main structure provided by Numpy is the *Fixed-Type Arrays*: **ndarray**. It is an efficient way of storing data and processing them.

Data analysis needs effective computational ressources to read/write and process data. Usually, the data set to be processed is a set of arrays.

The main structure provided by [Numpy](https://numpy.org/) is the *Fixed-Type Arrays*: **ndarray**. It is an efficient way of storing data and processing them.

[Scipy](https://www.scipy.org/)(*Scientific Python*) package is a dedicated tool that elaborates on Numpy to operate on *ndarray* efficiently. Quoting the *FAQ*, Scipy is "*set of open source (BSD licensed) scientific and numerical tools for Python. It currently supports special functions, integration, ordinary differential equation (ODE) solvers, gradient optimization, parallel programming tools, an expression-to-C++ compiler for fast execution, and others. A good rule of thumb is that if it’s covered in a general textbook on numerical computing (for example, the well-known Numerical Recipes series), it’s probably implemented in scipy*". This is the core of any data analysis package in Python.

%% Cell type:code id: tags:

``` python

# Find the scipy and numpy module and define an alias in the local namespace

importscipyassp

importnumpyasnp

```

%% Cell type:code id: tags:

``` python

A=np.array(range(10))# Create array from a list

print("A = {}".format(A))# note that there is 10 elements: 0,1,...,9

B=np.arange(10)# Create array from scratch

print("B = {}".format(B))

np.array?

```

%%%% Output: stream

A = [0 1 2 3 4 5 6 7 8 9]

B = [0 1 2 3 4 5 6 7 8 9]

%%%% Output: display_data

%% Cell type:markdown id: tags:

## Basics of Arrays

There are plenty of functions to create and to initialize specific array (np.zeros, np.ones, np.empty ...). For each case, it is possible to define the type (int8, uint8, float64 ...) by providing the corresponding parameter. More information regarding the different array types can be found here: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html and https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.dtypes.html.

There are plenty of functions to create and to initialize specific array (np.zeros, np.ones, np.empty ...). For each case, it is possible to define the type (int8, uint8, float64 ...) by providing the corresponding parameter. More information regarding the different array types can be found here: https://numpy.org/doc/stable/user/basics.types.html and https://numpy.org/doc/stable/reference/arrays.dtypes.html.

### Getting attributes

%% Cell type:code id: tags:

``` python

# Attributes

print("Number of elements in A: {}".format(A.size))

print("Number of dimension of A: {}".format(A.ndim))

print("Dimension of A: {}".format(A.shape))

print("Type of element in A: {}".format(A.dtype))

```

%%%% Output: stream

Number of elements in A: 10

Number of dimension of A: 1

Dimension of A: (10,)

Type of element in A: int64

%% Cell type:markdown id: tags:

It is possible to modify explicitely some attributes, in particlar the *shape*:

%% Cell type:code id: tags:

``` python

B.shape=(2,5)# Change the shape to two lines, 5 columns -> the number of total elements should be the same

print("B = \n {}".format(B))

C=B.reshape(10)# The function return a new array with the corresponding shape

print(B.shape)

print(C.shape)

```

%%%% Output: stream

B =

[[0 1 2 3 4]

[5 6 7 8 9]]

(2, 5)

(10,)

%% Cell type:markdown id: tags:

### Accessing elements

%% Cell type:code id: tags:

``` python

print("A = {}".format(A))

print(A[0])# First element

print(A[1])# Second element

print(A[-1])# Last element

print(A[-2])# Antepenultimate element

```

%%%% Output: stream

A = [0 1 2 3 4 5 6 7 8 9]

0

1

9

8

%% Cell type:code id: tags:

``` python

# Some slicing

print(A[0:3])# Return an array of elements of A from the first (index 0) to the third (index 2)

print(A[::2])# All elements with a step of 2

print(A[-3:-1])# Can use reverse order

```

%%%% Output: stream

[0 1 2]

[0 2 4 6 8]

[7 8]

%% Cell type:markdown id: tags:

## Computation on Array

### Universal functions

A general comment for interpreted laguage: **do not use loop if you can** ! It is slow and inefficient.

The comment apply here with Python. Scipy provide a large types of operation that are optimized to work on array directly (as in Matlab, R ...). In particular, *universal functions* (ufuncs) are a set of functions for fast element-wise operations (+, -, power ...). Let see a short example:

%% Cell type:code id: tags:

``` python

defmy_add(M,N):# Suppose that A and B have the same shape

# Evaluate execution time by repeating several runs based on a total of 2 seconds execution window

print('using loops')

%timeitmy_add(M,N)# using loops

print('using ufuncs')

%timeitM+N# using ufuncs equivalent to sp.add(A,B)

```

%%%% Output: stream

using loops

82 ms ± 2.24 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

using ufuncs

149 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%% Cell type:markdown id: tags:

Most all conventional functions exist: arithmetic, trigonometric, log/exp ... A detailed list is available here: https://docs.scipy.org/doc/numpy-1.13.0/reference/ufuncs.html

Most all conventional functions exist: arithmetic, trigonometric, log/exp ... A detailed list is available here: https://numpy.org/doc/stable/reference/ufuncs.html

%% Cell type:markdown id: tags:

### Reduction

Scipy provides a set of functions to extrac values from the array itself and for some specific dimension of the array

Numpy provides a set of functions to extrac values from the array itself and for some specific dimension of the array

%% Cell type:code id: tags:

``` python

A=sp.random.rand(5,4)

print("A = \n{}".format(A))

print(A.sum())# Sum over all element

print(A.sum(axis=0))# Sum over the lines: return an array of values

Using the same convention, it is possible to get the cumulative sum (cumsum), product of element (prod, cumprod), the maximum/minimum value (max, min) and their position (argmax, argmin) and the first and second statistical moment (mean, var/std). It is also possible to check if a condition is fullfilled for all or any elements of the array.

%% Cell type:code id: tags:

``` python

np.any(A>0)

```

%%%% Output: execute_result

True

%% Cell type:code id: tags:

``` python

np.all(A>0.5)

```

%%%% Output: execute_result

False

%% Cell type:markdown id: tags:

### Some exercices

- Find the maximum and minimum value of A

- Find the maximum of each line

- Find the mean value of each row

- Find the position of the minimum value of each row

Broadcasting allow to define efficient operations between arrays of different sizes, given some of them are compatible. An extreme example is adding a scalar to a matrix

Easy ? Now if I need to center the data, it is also super easy

%% Cell type:code id: tags:

``` python

print('Size of A: {}'.format(A.shape))

print('Size of the average of A along the lines: {}'.format(A.mean(axis=0).shape))

# Suppose that each line is a sample, and each column a measurement (i.e., a variable)

Ac=A-A.mean(axis=0)

print('Size of centered A: {}'.format(Ac.shape))

print('Ac=\n{}'.format(Ac))

```

%%%% Output: stream

Size of A: (5, 4)

Size of the average of A along the lines: (4,)

Size of centered A: (5, 4)

Ac=

[[-0.32882685 -0.39227622 0.11400358 0.02052094]

[ 0.26755058 0.15880677 -0.4545686 -0.14855098]

[-0.29965958 0.10913821 -0.1990461 -0.10749858]

[-0.01892739 -0.01695754 0.49393505 0.34026296]

[ 0.37986324 0.14128879 0.04567607 -0.10473435]]

%% Cell type:markdown id: tags:

If we need to standardize the data (substract the mean and divide by the standard deviation), it can be achieved easily:

%% Cell type:code id: tags:

``` python

As=(A-A.mean(axis=0))/A.std(axis=0)

print(As)

```

%%%% Output: stream

[[-1.14253071 -1.90838879 0.35861286 0.11443247]

[ 0.92962225 0.77258074 -1.42990378 -0.82837592]

[-1.0411871 0.53094764 -0.62612501 -0.59945235]

[-0.06576447 -0.08249693 1.55373599 1.8974338 ]

[ 1.31986004 0.68735733 0.14367994 -0.58403801]]

%% Cell type:markdown id: tags:

More details about broadcasting can be found here: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html

%% Cell type:markdown id: tags:

## Ploting in Python

The package [Matplotlib](https://matplotlib.org/) offers several functions to plot data. Below an example using 2D data, more complicated plots can be constructed when needed.