Skip to content
Snippets Groups Projects
Commit 69f50f86 authored by Eric Maldonado's avatar Eric Maldonado
Browse files

pandas links correction

parent 71bae5ca
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
<img width="800px" src="../fidle/img/00-Fidle-header-01.svg"></img>
# <!-- TITLE --> [NP1] - A short introduction to Numpy
<!-- DESC --> Numpy is an essential tool for the Scientific Python.
<!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->
## Objectives :
- Comprendre les grands principes de Numpy et son potentiel
Note : This notebook is strongly inspired by the UGA Python Introduction Course
See : **https://gricad-gitlab.univ-grenoble-alpes.fr/python-uga/py-training-2017**
%% Cell type:markdown id: tags:
## Step 1 - Numpy the beginning
Code using `numpy` usually starts with the import statement
%% Cell type:code id: tags:
``` python
import numpy as np
```
%% Cell type:markdown id: tags:
NumPy provides the type `np.ndarray`. Such array are multidimensionnal sequences of homogeneous elements. They can be created for example with the commands:
%% Cell type:code id: tags:
``` python
# from a list
l = [10.0, 12.5, 15.0, 17.5, 20.0]
np.array(l)
```
%% Output
array([10. , 12.5, 15. , 17.5, 20. ])
%% Cell type:code id: tags:
``` python
# fast but the values can be anything
np.empty(4)
```
%% Output
array([ 6.93990061e-310, 6.95333088e-310, -1.90019324e+120,
6.93987701e-310])
%% Cell type:code id: tags:
``` python
# slower than np.empty but the values are all 0.
np.zeros([2, 6])
```
%% Output
array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])
%% Cell type:code id: tags:
``` python
# multidimensional array
a = np.ones([2, 3, 4])
print(a.shape, a.size, a.dtype)
a
```
%% Output
(2, 3, 4) 24 float64
array([[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]],
[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]]])
%% Cell type:code id: tags:
``` python
# like range but produce 1D numpy array
np.arange(4)
```
%% Output
array([0, 1, 2, 3])
%% Cell type:code id: tags:
``` python
# np.arange can produce arrays of floats
np.arange(4.)
```
%% Output
array([0., 1., 2., 3.])
%% Cell type:code id: tags:
``` python
# another convenient function to generate 1D arrays
np.linspace(10, 20, 5)
```
%% Output
array([10. , 12.5, 15. , 17.5, 20. ])
%% Cell type:markdown id: tags:
A NumPy array can be easily converted to a Python list.
%% Cell type:code id: tags:
``` python
a = np.linspace(10, 20 ,5)
list(a)
```
%% Output
[10.0, 12.5, 15.0, 17.5, 20.0]
%% Cell type:code id: tags:
``` python
# Or even better
a.tolist()
```
%% Output
[10.0, 12.5, 15.0, 17.5, 20.0]
%% Cell type:markdown id: tags:
## Step 2 - Access elements
Elements in a `numpy` array can be accessed using indexing and slicing in any dimension. It also offers the same functionalities available in Fortan or Matlab.
### 2.1 - Indexes and slices
For example, we can create an array `A` and perform any kind of selection operations on it.
%% Cell type:code id: tags:
``` python
A = np.random.random([4, 5])
A
```
%% Output
array([[0.14726334, 0.90799321, 0.67130094, 0.23978162, 0.96444415],
[0.26039418, 0.06135763, 0.35856793, 0.73366941, 0.50698925],
[0.39557097, 0.55950866, 0.70056205, 0.65344863, 0.90891062],
[0.19049184, 0.56355734, 0.71701494, 0.66035889, 0.06400119]])
%% Cell type:code id: tags:
``` python
# Get the element from second line, first column
A[1, 0]
```
%% Output
0.26039417830707656
%% Cell type:code id: tags:
``` python
# Get the first two lines
A[:2]
```
%% Output
array([[0.14726334, 0.90799321, 0.67130094, 0.23978162, 0.96444415],
[0.26039418, 0.06135763, 0.35856793, 0.73366941, 0.50698925]])
%% Cell type:code id: tags:
``` python
# Get the last column
A[:, -1]
```
%% Output
array([0.96444415, 0.50698925, 0.90891062, 0.06400119])
%% Cell type:code id: tags:
``` python
# Get the first two lines and the columns with an even index
A[:2, ::2]
```
%% Output
array([[0.14726334, 0.67130094, 0.96444415],
[0.26039418, 0.35856793, 0.50698925]])
%% Cell type:markdown id: tags:
### 2.2 - Using a mask to select elements validating a condition:
%% Cell type:code id: tags:
``` python
cond = A > 0.5
print(cond)
print(A[cond])
```
%% Output
[[False True True False True]
[False False False True True]
[False True True True True]
[False True True True False]]
[0.90799321 0.67130094 0.96444415 0.73366941 0.50698925 0.55950866
0.70056205 0.65344863 0.90891062 0.56355734 0.71701494 0.66035889]
%% Cell type:markdown id: tags:
The mask is in fact a particular case of the advanced indexing capabilities provided by NumPy. For example, it is even possible to use lists for indexing:
%% Cell type:code id: tags:
``` python
# Selecting only particular columns
print(A)
A[:, [0, 1, 4]]
```
%% Output
[[0.14726334 0.90799321 0.67130094 0.23978162 0.96444415]
[0.26039418 0.06135763 0.35856793 0.73366941 0.50698925]
[0.39557097 0.55950866 0.70056205 0.65344863 0.90891062]
[0.19049184 0.56355734 0.71701494 0.66035889 0.06400119]]
array([[0.14726334, 0.90799321, 0.96444415],
[0.26039418, 0.06135763, 0.50698925],
[0.39557097, 0.55950866, 0.90891062],
[0.19049184, 0.56355734, 0.06400119]])
%% Cell type:markdown id: tags:
## Step 3 - Perform array manipulations
### 3.1 - Apply arithmetic operations to whole arrays (element-wise):
%% Cell type:code id: tags:
``` python
(A+5)**2
```
%% Output
array([[26.49431985, 34.90438372, 32.16365436, 27.45531142, 35.57459401],
[27.67174691, 25.61734103, 28.71425022, 32.87496493, 30.32693058],
[29.11218606, 30.9081365 , 32.49640767, 31.96148136, 34.91522467],
[26.94120557, 30.95317031, 32.68425986, 32.03966276, 25.6441081 ]])
%% Cell type:markdown id: tags:
### 3.2 - Apply functions element-wise:
%% Cell type:code id: tags:
``` python
np.exp(A) # With numpy arrays, use the functions from numpy !
```
%% Output
array([[1.15865904, 2.47934201, 1.95678132, 1.27097157, 2.62332907],
[1.29744141, 1.0632791 , 1.43127825, 2.08270892, 1.66028496],
[1.48523197, 1.74981253, 2.01488485, 1.92215822, 2.48161763],
[1.2098445 , 1.75691133, 2.04830976, 1.93548684, 1.06609367]])
%% Cell type:markdown id: tags:
### 3.3 - Setting parts of arrays
%% Cell type:code id: tags:
``` python
A[:, 0] = 0.
print(A)
```
%% Output
[[0. 0.90799321 0.67130094 0.23978162 0.96444415]
[0. 0.06135763 0.35856793 0.73366941 0.50698925]
[0. 0.55950866 0.70056205 0.65344863 0.90891062]
[0. 0.56355734 0.71701494 0.66035889 0.06400119]]
%% Cell type:code id: tags:
``` python
# BONUS: Safe element-wise inverse with masks
cond = (A != 0)
A[cond] = 1./A[cond]
print(A)
```
%% Output
[[ 0. 1.10132983 1.48964487 4.17046144 1.03686668]
[ 0. 16.29789234 2.78887186 1.36301171 1.97242842]
[ 0. 1.78728245 1.42742531 1.53034219 1.1002182 ]
[ 0. 1.77444232 1.39467107 1.51432807 15.62470834]]
%% Cell type:markdown id: tags:
## Step 4 - Attributes and methods of `np.ndarray` (see the [doc](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray))
%% Cell type:code id: tags:
``` python
for i,v in enumerate([s for s in dir(A) if not s.startswith('__')]):
print(f'{v:16}', end='')
if (i+1) % 6 == 0 :print('')
```
%% Output
T all any argmax argmin argpartition
argsort astype base byteswap choose clip
compress conj conjugate copy ctypes cumprod
cumsum data diagonal dot dtype dump
dumps fill flags flat flatten getfield
imag item itemset itemsize max mean
min nbytes ndim newbyteorder nonzero partition
prod ptp put ravel real repeat
reshape resize round searchsorted setfield setflags
shape size sort squeeze std strides
sum swapaxes take tobytes tofile tolist
tostring trace transpose var view
%% Cell type:code id: tags:
``` python
# Ex1: Get the mean through different dimensions
print(A)
print('Mean value', A.mean())
print('Mean line', A.mean(axis=0))
print('Mean column', A.mean(axis=1))
```
%% Output
[[ 0. 1.10132983 1.48964487 4.17046144 1.03686668]
[ 0. 16.29789234 2.78887186 1.36301171 1.97242842]
[ 0. 1.78728245 1.42742531 1.53034219 1.1002182 ]
[ 0. 1.77444232 1.39467107 1.51432807 15.62470834]]
Mean value 2.818696254398785
Mean line [0. 5.24023674 1.77515328 2.14453585 4.93355541]
Mean column [1.55966056 4.48444087 1.16905363 4.06162996]
%% Cell type:code id: tags:
``` python
# Ex2: Convert a 2D array in 1D keeping all elements
print(A)
print(A.shape)
A_flat = A.flatten()
print(A_flat, A_flat.shape)
```
%% Output
[[ 0. 1.10132983 1.48964487 4.17046144 1.03686668]
[ 0. 16.29789234 2.78887186 1.36301171 1.97242842]
[ 0. 1.78728245 1.42742531 1.53034219 1.1002182 ]
[ 0. 1.77444232 1.39467107 1.51432807 15.62470834]]
(4, 5)
[ 0. 1.10132983 1.48964487 4.17046144 1.03686668 0.
16.29789234 2.78887186 1.36301171 1.97242842 0. 1.78728245
1.42742531 1.53034219 1.1002182 0. 1.77444232 1.39467107
1.51432807 15.62470834] (20,)
%% Cell type:markdown id: tags:
### 4.1 - Remark: dot product
%% Cell type:code id: tags:
``` python
b = np.linspace(0, 10, 11)
c = b @ b
# before 3.5:
# c = b.dot(b)
print(b)
print(c)
```
%% Output
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
385.0
%% Cell type:markdown id: tags:
### 4.2 - For Matlab users
| ` ` | Matlab | Numpy |
| ------------- | ------ | ----- |
| element wise | `.*` | `*` |
| dot product | `*` | `@` |
%% Cell type:markdown id: tags:
`numpy` arrays can also be sorted, even when they are composed of complex data if the type of the columns are explicitly stated with `dtypes`.
%% Cell type:markdown id: tags:
### 4.3 - NumPy and SciPy sub-packages:
We already saw `numpy.random` to generate `numpy` arrays filled with random values. This submodule also provides functions related to distributions (Poisson, gaussian, etc.) and permutations.
%% Cell type:markdown id: tags:
To perform linear algebra with dense matrices, we can use the submodule `numpy.linalg`. For instance, in order to compute the determinant of a random matrix, we use the method `det`
%% Cell type:code id: tags:
``` python
A = np.random.random([5,5])
print(A)
np.linalg.det(A)
```
%% Output
[[0.33277412 0.18065847 0.10352574 0.48095553 0.97748505]
[0.20756676 0.33166777 0.00808192 0.18868636 0.1722338 ]
[0.94092977 0.21755657 0.52045179 0.45008315 0.1751413 ]
[0.27404121 0.53531168 0.41209088 0.22503687 0.50026306]
[0.23077516 0.99886616 0.74286904 0.40849416 0.57970741]]
-0.026288777656342802
%% Cell type:code id: tags:
``` python
squared_subA = A[1:3, 1:3]
print(squared_subA)
np.linalg.inv(squared_subA)
```
%% Output
[[0.33166777 0.00808192]
[0.21755657 0.52045179]]
array([[ 3.0460928 , -0.04730175],
[-1.27331197, 1.94118039]])
%% Cell type:markdown id: tags:
### 4.4 - Introduction to Pandas: Python Data Analysis Library
Pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for Python.
[Pandas tutorial](https://pandas.pydata.org/pandas-docs/stable/10min.html)
[Grenoble Python Working Session](https://github.com/iutzeler/Pres_Pandas/)
[Pandas for SQL Users](https://hackernoon.com/pandas-cheatsheet-for-sql-people-part-1-2976894acd0)
[Pandas for SQL Users](http://sergilehkyi.com/translating-sql-to-pandas/)
[Pandas Introduction Training HPC Python@UGA](https://gricad-gitlab.univ-grenoble-alpes.fr/python-uga/training-hpc/-/blob/master/ipynb/11_pandas.ipynb)
%% Cell type:markdown id: tags:
---
<img width="80px" src="../fidle/img/00-Fidle-logo-01.svg"></img>
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment