Skip to content
Snippets Groups Projects
Commit 1f912370 authored by Eric Maldonado's avatar Eric Maldonado
Browse files

Ajout kickstart numpy

Former-commit-id: 2df0e28b
parent cfd3e953
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# A short introduction to Numpy
Strongly inspired by the UGA Python Introduction Course
https://gricad-gitlab.univ-grenoble-alpes.fr/python-uga/py-training-2017
%% Cell type:markdown id: tags:
## A short introduction on NumPy
Code using `numpy` usually starts with the import statement
%% Cell type:code id: tags:
``` python
import numpy as np
```
%% Cell type:markdown id: tags:
NumPy provides the type `np.ndarray`. Such array are multidimensionnal sequences of homogeneous elements. They can be created for example with the commands:
%% Cell type:code id: tags:
``` python
# from a list
l = [10.0, 12.5, 15.0, 17.5, 20.0]
np.array(l)
```
%% Output
array([10. , 12.5, 15. , 17.5, 20. ])
%% Cell type:code id: tags:
``` python
# fast but the values can be anything
np.empty(4)
```
%% Output
array([1.27880790e-316, 0.00000000e+000, 6.91986808e-310, 1.57378525e-316])
%% Cell type:code id: tags:
``` python
# slower than np.empty but the values are all 0.
np.zeros([2, 6])
```
%% Output
array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])
%% Cell type:code id: tags:
``` python
# multidimensional array
a = np.ones([2, 3, 4])
print(a.shape, a.size, a.dtype)
a
```
%% Output
(2, 3, 4) 24 float64
array([[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]],
[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]]])
%% Cell type:code id: tags:
``` python
# like range but produce 1D numpy array
np.arange(4)
```
%% Output
array([0, 1, 2, 3])
%% Cell type:code id: tags:
``` python
# np.arange can produce arrays of floats
np.arange(4.)
```
%% Output
array([0., 1., 2., 3.])
%% Cell type:code id: tags:
``` python
# another convenient function to generate 1D arrays
np.linspace(10, 20, 5)
```
%% Output
array([10. , 12.5, 15. , 17.5, 20. ])
%% Cell type:markdown id: tags:
A NumPy array can be easily converted to a Python list.
%% Cell type:code id: tags:
``` python
a = np.linspace(10, 20 ,5)
list(a)
```
%% Output
[10.0, 12.5, 15.0, 17.5, 20.0]
%% Cell type:code id: tags:
``` python
# Or even better
a.tolist()
```
%% Output
[10.0, 12.5, 15.0, 17.5, 20.0]
%% Cell type:markdown id: tags:
# Manipulating NumPy arrays
%% Cell type:markdown id: tags:
## Access elements
Elements in a `numpy` array can be accessed using indexing and slicing in any dimension. It also offers the same functionalities available in Fortan or Matlab.
### Indexes and slices
For example, we can create an array `A` and perform any kind of selection operations on it.
%% Cell type:code id: tags:
``` python
A = np.random.random([4, 5])
A
```
%% Output
array([[0.89925962, 0.31519992, 0.17170063, 0.06102236, 0.6055506 ],
[0.43365108, 0.67461267, 0.34962124, 0.75648088, 0.53096922],
[0.65643503, 0.4723704 , 0.77202087, 0.50192904, 0.14067726],
[0.80709755, 0.2314217 , 0.65465368, 0.28459125, 0.54727527]])
%% Cell type:code id: tags:
``` python
# Get the element from second line, first column
A[1, 0]
```
%% Output
0.4336510750584107
%% Cell type:code id: tags:
``` python
# Get the first two lines
A[:2]
```
%% Output
array([[0.89925962, 0.31519992, 0.17170063, 0.06102236, 0.6055506 ],
[0.43365108, 0.67461267, 0.34962124, 0.75648088, 0.53096922]])
%% Cell type:code id: tags:
``` python
# Get the last column
A[:, -1]
```
%% Output
array([0.6055506 , 0.53096922, 0.14067726, 0.54727527])
%% Cell type:code id: tags:
``` python
# Get the first two lines and the columns with an even index
A[:2, ::2]
```
%% Output
array([[0.89925962, 0.17170063, 0.6055506 ],
[0.43365108, 0.34962124, 0.53096922]])
%% Cell type:markdown id: tags:
### Using a mask to select elements validating a condition:
%% Cell type:code id: tags:
``` python
cond = A > 0.5
print(cond)
print(A[cond])
```
%% Output
[[ True False False False True]
[False True False True True]
[ True False True True False]
[ True False True False True]]
[0.89925962 0.6055506 0.67461267 0.75648088 0.53096922 0.65643503
0.77202087 0.50192904 0.80709755 0.65465368 0.54727527]
%% Cell type:markdown id: tags:
The mask is in fact a particular case of the advanced indexing capabilities provided by NumPy. For example, it is even possible to use lists for indexing:
%% Cell type:code id: tags:
``` python
# Selecting only particular columns
print(A)
A[:, [0, 1, 4]]
```
%% Output
[[0.89925962 0.31519992 0.17170063 0.06102236 0.6055506 ]
[0.43365108 0.67461267 0.34962124 0.75648088 0.53096922]
[0.65643503 0.4723704 0.77202087 0.50192904 0.14067726]
[0.80709755 0.2314217 0.65465368 0.28459125 0.54727527]]
array([[0.89925962, 0.31519992, 0.6055506 ],
[0.43365108, 0.67461267, 0.53096922],
[0.65643503, 0.4723704 , 0.14067726],
[0.80709755, 0.2314217 , 0.54727527]])
%% Cell type:markdown id: tags:
## Perform array manipulations
### Apply arithmetic operations to whole arrays (element-wise):
%% Cell type:code id: tags:
``` python
(A+5)**2
```
%% Output
array([[34.80126403, 28.25135024, 26.7464874 , 25.61394735, 31.42219749],
[29.52456401, 32.20122896, 28.61844741, 33.13707212, 30.59162046],
[31.99525724, 29.94683782, 33.31622493, 30.27122313, 26.42656267],
[33.72238198, 27.36777304, 31.97510827, 27.92690466, 30.77226288]])
%% Cell type:markdown id: tags:
### Apply functions element-wise:
%% Cell type:code id: tags:
``` python
np.exp(A) # With numpy arrays, use the functions from numpy !
```
%% Output
array([[2.45778274, 1.37053329, 1.18732233, 1.06292268, 1.83226077],
[1.54288042, 1.9632724 , 1.41853016, 2.13076459, 1.70057974],
[1.92790714, 1.60379132, 2.16413527, 1.65190478, 1.15105309],
[2.24139301, 1.26039064, 1.92447592, 1.3292186 , 1.72853679]])
%% Cell type:markdown id: tags:
### Setting parts of arrays
%% Cell type:code id: tags:
``` python
A[:, 0] = 0.
print(A)
```
%% Output
[[0. 0.31519992 0.17170063 0.06102236 0.6055506 ]
[0. 0.67461267 0.34962124 0.75648088 0.53096922]
[0. 0.4723704 0.77202087 0.50192904 0.14067726]
[0. 0.2314217 0.65465368 0.28459125 0.54727527]]
%% Cell type:code id: tags:
``` python
# BONUS: Safe element-wise inverse with masks
cond = (A != 0)
A[cond] = 1./A[cond]
print(A)
```
%% Output
[[ 0. 3.17258959 5.82409047 16.387435 1.65138967]
[ 0. 1.48233207 2.86023812 1.32191048 1.88334836]
[ 0. 2.11698277 1.29530177 1.99231351 7.10846954]
[ 0. 4.32111589 1.5275252 3.51381149 1.82723405]]
%% Cell type:markdown id: tags:
### Attributes and methods of `np.ndarray` (see the [doc](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray))
%% Cell type:code id: tags:
``` python
print([s for s in dir(A) if not s.startswith('__')])
```
%% Output
['T', 'all', 'any', 'argmax', 'argmin', 'argpartition', 'argsort', 'astype', 'base', 'byteswap', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'ctypes', 'cumprod', 'cumsum', 'data', 'diagonal', 'dot', 'dtype', 'dump', 'dumps', 'fill', 'flags', 'flat', 'flatten', 'getfield', 'imag', 'item', 'itemset', 'itemsize', 'max', 'mean', 'min', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'partition', 'prod', 'ptp', 'put', 'ravel', 'real', 'repeat', 'reshape', 'resize', 'round', 'searchsorted', 'setfield', 'setflags', 'shape', 'size', 'sort', 'squeeze', 'std', 'strides', 'sum', 'swapaxes', 'take', 'tobytes', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'var', 'view']
%% Cell type:code id: tags:
``` python
# Ex1: Get the mean through different dimensions
print(A)
print('Mean value', A.mean())
print('Mean line', A.mean(axis=0))
print('Mean column', A.mean(axis=1))
```
%% Output
[[ 0. 3.17258959 5.82409047 16.387435 1.65138967]
[ 0. 1.48233207 2.86023812 1.32191048 1.88334836]
[ 0. 2.11698277 1.29530177 1.99231351 7.10846954]
[ 0. 4.32111589 1.5275252 3.51381149 1.82723405]]
Mean value 2.9143043986324475
Mean line [0. 2.77325508 2.87678889 5.80386762 3.1176104 ]
Mean column [5.40710095 1.50956581 2.50261352 2.23793733]
%% Cell type:code id: tags:
``` python
# Ex2: Convert a 2D array in 1D keeping all elements
print(A, A.shape)
A_flat = A.flatten()
print(A_flat, A_flat.shape)
```
%% Output
[[ 0. 3.17258959 5.82409047 16.387435 1.65138967]
[ 0. 1.48233207 2.86023812 1.32191048 1.88334836]
[ 0. 2.11698277 1.29530177 1.99231351 7.10846954]
[ 0. 4.32111589 1.5275252 3.51381149 1.82723405]] (4, 5)
[ 0. 3.17258959 5.82409047 16.387435 1.65138967 0.
1.48233207 2.86023812 1.32191048 1.88334836 0. 2.11698277
1.29530177 1.99231351 7.10846954 0. 4.32111589 1.5275252
3.51381149 1.82723405] (20,)
%% Cell type:markdown id: tags:
### Remark: dot product
%% Cell type:code id: tags:
``` python
b = np.linspace(0, 10, 11)
c = b @ b
# before 3.5:
# c = b.dot(b)
print(b)
print(c)
```
%% Output
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
385.0
%% Cell type:markdown id: tags:
#### For Matlab users
| ` ` | Matlab | Numpy |
| ------------- | ------ | ----- |
| element wise | `.*` | `*` |
| dot product | `*` | `@` |
%% Cell type:markdown id: tags:
`numpy` arrays can also be sorted, even when they are composed of complex data if the type of the columns are explicitly stated with `dtypes`.
%% Cell type:markdown id: tags:
#### NumPy and SciPy sub-packages:
We already saw `numpy.random` to generate `numpy` arrays filled with random values. This submodule also provides functions related to distributions (Poisson, gaussian, etc.) and permutations.
%% Cell type:markdown id: tags:
To perform linear algebra with dense matrices, we can use the submodule `numpy.linalg`. For instance, in order to compute the determinant of a random matrix, we use the method `det`
%% Cell type:code id: tags:
``` python
A = np.random.random([5,5])
print(A)
np.linalg.det(A)
```
%% Output
[[0.47138506 0.41353868 0.09441948 0.225147 0.82335198]
[0.04490952 0.14682972 0.31792846 0.22918746 0.73823443]
[0.50485749 0.99705961 0.51896582 0.93318595 0.11375617]
[0.37148317 0.0477689 0.29061475 0.41826056 0.47950005]
[0.70324502 0.82838271 0.92172528 0.79532669 0.56698101]]
0.06968780805887545
%% Cell type:code id: tags:
``` python
squared_subA = A[1:3, 1:3]
print(squared_subA)
np.linalg.inv(squared_subA)
```
%% Output
[[0.14682972 0.31792846]
[0.99705961 0.51896582]]
array([[-2.15522717, 1.32033369],
[ 4.14071576, -0.6097731 ]])
%% Cell type:markdown id: tags:
## Introduction to Pandas: Python Data Analysis Library
Pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for Python.
[Pandas tutorial](https://pandas.pydata.org/pandas-docs/stable/10min.html)
[Grenoble Python Working Session](https://github.com/iutzeler/Pres_Pandas/)
[Pandas for SQL Users](https://hackernoon.com/pandas-cheatsheet-for-sql-people-part-1-2976894acd0)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment