Skip to content
Snippets Groups Projects
Commit 2df0e28b authored by Eric Maldonado's avatar Eric Maldonado
Browse files

Ajout kickstart numpy

parent 01e54e8d
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# A short introduction to Numpy
Strongly inspired by the UGA Python Introduction Course
https://gricad-gitlab.univ-grenoble-alpes.fr/python-uga/py-training-2017
%% Cell type:markdown id: tags:
## A short introduction on NumPy
Code using `numpy` usually starts with the import statement
%% Cell type:code id: tags:
``` python
import numpy as np
```
%% Cell type:markdown id: tags:
NumPy provides the type `np.ndarray`. Such array are multidimensionnal sequences of homogeneous elements. They can be created for example with the commands:
%% Cell type:code id: tags:
``` python
# from a list
l = [10.0, 12.5, 15.0, 17.5, 20.0]
np.array(l)
```
%% Output
array([10. , 12.5, 15. , 17.5, 20. ])
%% Cell type:code id: tags:
``` python
# fast but the values can be anything
np.empty(4)
```
%% Output
array([1.27880790e-316, 0.00000000e+000, 6.91986808e-310, 1.57378525e-316])
%% Cell type:code id: tags:
``` python
# slower than np.empty but the values are all 0.
np.zeros([2, 6])
```
%% Output
array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])
%% Cell type:code id: tags:
``` python
# multidimensional array
a = np.ones([2, 3, 4])
print(a.shape, a.size, a.dtype)
a
```
%% Output
(2, 3, 4) 24 float64
array([[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]],
[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]]])
%% Cell type:code id: tags:
``` python
# like range but produce 1D numpy array
np.arange(4)
```
%% Output
array([0, 1, 2, 3])
%% Cell type:code id: tags:
``` python
# np.arange can produce arrays of floats
np.arange(4.)
```
%% Output
array([0., 1., 2., 3.])
%% Cell type:code id: tags:
``` python
# another convenient function to generate 1D arrays
np.linspace(10, 20, 5)
```
%% Output
array([10. , 12.5, 15. , 17.5, 20. ])
%% Cell type:markdown id: tags:
A NumPy array can be easily converted to a Python list.
%% Cell type:code id: tags:
``` python
a = np.linspace(10, 20 ,5)
list(a)
```
%% Output
[10.0, 12.5, 15.0, 17.5, 20.0]
%% Cell type:code id: tags:
``` python
# Or even better
a.tolist()
```
%% Output
[10.0, 12.5, 15.0, 17.5, 20.0]
%% Cell type:markdown id: tags:
# Manipulating NumPy arrays
%% Cell type:markdown id: tags:
## Access elements
Elements in a `numpy` array can be accessed using indexing and slicing in any dimension. It also offers the same functionalities available in Fortan or Matlab.
### Indexes and slices
For example, we can create an array `A` and perform any kind of selection operations on it.
%% Cell type:code id: tags:
``` python
A = np.random.random([4, 5])
A
```
%% Output
array([[0.89925962, 0.31519992, 0.17170063, 0.06102236, 0.6055506 ],
[0.43365108, 0.67461267, 0.34962124, 0.75648088, 0.53096922],
[0.65643503, 0.4723704 , 0.77202087, 0.50192904, 0.14067726],
[0.80709755, 0.2314217 , 0.65465368, 0.28459125, 0.54727527]])
%% Cell type:code id: tags:
``` python
# Get the element from second line, first column
A[1, 0]
```
%% Output
0.4336510750584107
%% Cell type:code id: tags:
``` python
# Get the first two lines
A[:2]
```
%% Output
array([[0.89925962, 0.31519992, 0.17170063, 0.06102236, 0.6055506 ],
[0.43365108, 0.67461267, 0.34962124, 0.75648088, 0.53096922]])
%% Cell type:code id: tags:
``` python
# Get the last column
A[:, -1]
```
%% Output
array([0.6055506 , 0.53096922, 0.14067726, 0.54727527])
%% Cell type:code id: tags:
``` python
# Get the first two lines and the columns with an even index
A[:2, ::2]
```
%% Output
array([[0.89925962, 0.17170063, 0.6055506 ],
[0.43365108, 0.34962124, 0.53096922]])
%% Cell type:markdown id: tags:
### Using a mask to select elements validating a condition:
%% Cell type:code id: tags:
``` python
cond = A > 0.5
print(cond)
print(A[cond])
```
%% Output
[[ True False False False True]
[False True False True True]
[ True False True True False]
[ True False True False True]]
[0.89925962 0.6055506 0.67461267 0.75648088 0.53096922 0.65643503
0.77202087 0.50192904 0.80709755 0.65465368 0.54727527]
%% Cell type:markdown id: tags:
The mask is in fact a particular case of the advanced indexing capabilities provided by NumPy. For example, it is even possible to use lists for indexing:
%% Cell type:code id: tags:
``` python
# Selecting only particular columns
print(A)
A[:, [0, 1, 4]]
```
%% Output
[[0.89925962 0.31519992 0.17170063 0.06102236 0.6055506 ]
[0.43365108 0.67461267 0.34962124 0.75648088 0.53096922]
[0.65643503 0.4723704 0.77202087 0.50192904 0.14067726]
[0.80709755 0.2314217 0.65465368 0.28459125 0.54727527]]
array([[0.89925962, 0.31519992, 0.6055506 ],
[0.43365108, 0.67461267, 0.53096922],
[0.65643503, 0.4723704 , 0.14067726],
[0.80709755, 0.2314217 , 0.54727527]])
%% Cell type:markdown id: tags:
## Perform array manipulations
### Apply arithmetic operations to whole arrays (element-wise):
%% Cell type:code id: tags:
``` python
(A+5)**2
```
%% Output
array([[34.80126403, 28.25135024, 26.7464874 , 25.61394735, 31.42219749],
[29.52456401, 32.20122896, 28.61844741, 33.13707212, 30.59162046],
[31.99525724, 29.94683782, 33.31622493, 30.27122313, 26.42656267],
[33.72238198, 27.36777304, 31.97510827, 27.92690466, 30.77226288]])
%% Cell type:markdown id: tags:
### Apply functions element-wise:
%% Cell type:code id: tags:
``` python
np.exp(A) # With numpy arrays, use the functions from numpy !
```
%% Output
array([[2.45778274, 1.37053329, 1.18732233, 1.06292268, 1.83226077],
[1.54288042, 1.9632724 , 1.41853016, 2.13076459, 1.70057974],
[1.92790714, 1.60379132, 2.16413527, 1.65190478, 1.15105309],
[2.24139301, 1.26039064, 1.92447592, 1.3292186 , 1.72853679]])
%% Cell type:markdown id: tags:
### Setting parts of arrays
%% Cell type:code id: tags:
``` python
A[:, 0] = 0.
print(A)
```
%% Output
[[0. 0.31519992 0.17170063 0.06102236 0.6055506 ]
[0. 0.67461267 0.34962124 0.75648088 0.53096922]
[0. 0.4723704 0.77202087 0.50192904 0.14067726]
[0. 0.2314217 0.65465368 0.28459125 0.54727527]]
%% Cell type:code id: tags:
``` python
# BONUS: Safe element-wise inverse with masks
cond = (A != 0)
A[cond] = 1./A[cond]
print(A)
```
%% Output
[[ 0. 3.17258959 5.82409047 16.387435 1.65138967]
[ 0. 1.48233207 2.86023812 1.32191048 1.88334836]
[ 0. 2.11698277 1.29530177 1.99231351 7.10846954]
[ 0. 4.32111589 1.5275252 3.51381149 1.82723405]]
%% Cell type:markdown id: tags:
### Attributes and methods of `np.ndarray` (see the [doc](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray))
%% Cell type:code id: tags:
``` python
print([s for s in dir(A) if not s.startswith('__')])
```
%% Output
['T', 'all', 'any', 'argmax', 'argmin', 'argpartition', 'argsort', 'astype', 'base', 'byteswap', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'ctypes', 'cumprod', 'cumsum', 'data', 'diagonal', 'dot', 'dtype', 'dump', 'dumps', 'fill', 'flags', 'flat', 'flatten', 'getfield', 'imag', 'item', 'itemset', 'itemsize', 'max', 'mean', 'min', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'partition', 'prod', 'ptp', 'put', 'ravel', 'real', 'repeat', 'reshape', 'resize', 'round', 'searchsorted', 'setfield', 'setflags', 'shape', 'size', 'sort', 'squeeze', 'std', 'strides', 'sum', 'swapaxes', 'take', 'tobytes', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'var', 'view']
%% Cell type:code id: tags:
``` python
# Ex1: Get the mean through different dimensions
print(A)
print('Mean value', A.mean())
print('Mean line', A.mean(axis=0))
print('Mean column', A.mean(axis=1))
```
%% Output
[[ 0. 3.17258959 5.82409047 16.387435 1.65138967]
[ 0. 1.48233207 2.86023812 1.32191048 1.88334836]
[ 0. 2.11698277 1.29530177 1.99231351 7.10846954]
[ 0. 4.32111589 1.5275252 3.51381149 1.82723405]]
Mean value 2.9143043986324475
Mean line [0. 2.77325508 2.87678889 5.80386762 3.1176104 ]
Mean column [5.40710095 1.50956581 2.50261352 2.23793733]
%% Cell type:code id: tags:
``` python
# Ex2: Convert a 2D array in 1D keeping all elements
print(A, A.shape)
A_flat = A.flatten()
print(A_flat, A_flat.shape)
```
%% Output
[[ 0. 3.17258959 5.82409047 16.387435 1.65138967]
[ 0. 1.48233207 2.86023812 1.32191048 1.88334836]
[ 0. 2.11698277 1.29530177 1.99231351 7.10846954]
[ 0. 4.32111589 1.5275252 3.51381149 1.82723405]] (4, 5)
[ 0. 3.17258959 5.82409047 16.387435 1.65138967 0.
1.48233207 2.86023812 1.32191048 1.88334836 0. 2.11698277
1.29530177 1.99231351 7.10846954 0. 4.32111589 1.5275252
3.51381149 1.82723405] (20,)
%% Cell type:markdown id: tags:
### Remark: dot product
%% Cell type:code id: tags:
``` python
b = np.linspace(0, 10, 11)
c = b @ b
# before 3.5:
# c = b.dot(b)
print(b)
print(c)
```
%% Output
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
385.0
%% Cell type:markdown id: tags:
#### For Matlab users
| ` ` | Matlab | Numpy |
| ------------- | ------ | ----- |
| element wise | `.*` | `*` |
| dot product | `*` | `@` |
%% Cell type:markdown id: tags:
`numpy` arrays can also be sorted, even when they are composed of complex data if the type of the columns are explicitly stated with `dtypes`.
%% Cell type:markdown id: tags:
#### NumPy and SciPy sub-packages:
We already saw `numpy.random` to generate `numpy` arrays filled with random values. This submodule also provides functions related to distributions (Poisson, gaussian, etc.) and permutations.
%% Cell type:markdown id: tags:
To perform linear algebra with dense matrices, we can use the submodule `numpy.linalg`. For instance, in order to compute the determinant of a random matrix, we use the method `det`
%% Cell type:code id: tags:
``` python
A = np.random.random([5,5])
print(A)
np.linalg.det(A)
```
%% Output
[[0.47138506 0.41353868 0.09441948 0.225147 0.82335198]
[0.04490952 0.14682972 0.31792846 0.22918746 0.73823443]
[0.50485749 0.99705961 0.51896582 0.93318595 0.11375617]
[0.37148317 0.0477689 0.29061475 0.41826056 0.47950005]
[0.70324502 0.82838271 0.92172528 0.79532669 0.56698101]]
0.06968780805887545
%% Cell type:code id: tags:
``` python
squared_subA = A[1:3, 1:3]
print(squared_subA)
np.linalg.inv(squared_subA)
```
%% Output
[[0.14682972 0.31792846]
[0.99705961 0.51896582]]
array([[-2.15522717, 1.32033369],
[ 4.14071576, -0.6097731 ]])
%% Cell type:markdown id: tags:
## Introduction to Pandas: Python Data Analysis Library
Pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for Python.
[Pandas tutorial](https://pandas.pydata.org/pandas-docs/stable/10min.html)
[Grenoble Python Working Session](https://github.com/iutzeler/Pres_Pandas/)
[Pandas for SQL Users](https://hackernoon.com/pandas-cheatsheet-for-sql-people-part-1-2976894acd0)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment