pandas links correction

69f50f86 · Eric Maldonado · 71bae5ca · 69f50f86
Commit 69f50f86 authored 5 years ago by Eric Maldonado
--- a/Misc/Numpy.ipynb
+++ b/Misc/Numpy.ipynb
@@ -835,7 +835,8 @@
    "\n",
    "[Pandas tutorial](https://pandas.pydata.org/pandas-docs/stable/10min.html)\n",
    "[Grenoble Python Working Session](https://github.com/iutzeler/Pres_Pandas/)\n",
-    "[Pandas for SQL Users](https://hackernoon.com/pandas-cheatsheet-for-sql-people-part-1-2976894acd0)"
+    "[Pandas for SQL Users](http://sergilehkyi.com/translating-sql-to-pandas/)\n",
+    "[Pandas Introduction Training HPC Python@UGA](https://gricad-gitlab.univ-grenoble-alpes.fr/python-uga/training-hpc/-/blob/master/ipynb/11_pandas.ipynb)"
   ]
  },
  {
@@ -864,7 +865,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.7.6"
+   "version": "3.6.8"
  }
 },
 "nbformat": 4,

 %% Cell type:markdown id: tags:

 <img width="800px" src="../fidle/img/00-Fidle-header-01.svg"></img>

 # <!-- TITLE --> [NP1] - A short introduction to Numpy
 <!-- DESC --> Numpy is an essential tool for the Scientific Python.
 <!-- AUTHOR : Jean-Luc Parouty (CNRS/SIMaP) -->

 ## Objectives :
 - Comprendre les grands principes de Numpy et son potentiel

 Note : This notebook is strongly inspired by the UGA Python Introduction Course
 See : **https://gricad-gitlab.univ-grenoble-alpes.fr/python-uga/py-training-2017**

 %% Cell type:markdown id: tags:

 ## Step 1 - Numpy the beginning

 Code using `numpy` usually starts with the import statement

 %% Cell type:code id: tags:

 ``` python
 import numpy as np
 ```

 %% Cell type:markdown id: tags:

 NumPy provides the type `np.ndarray`. Such array are multidimensionnal sequences of homogeneous elements. They can be created for example with the commands:

 %% Cell type:code id: tags:

 ``` python
 # from a list
 l = [10.0, 12.5, 15.0, 17.5, 20.0]
 np.array(l)
 ```

 %% Output

    array([10. , 12.5, 15. , 17.5, 20. ])

 %% Cell type:code id: tags:

 ``` python
 # fast but the values can be anything
 np.empty(4)
 ```

 %% Output

    array([ 6.93990061e-310,  6.95333088e-310, -1.90019324e+120,
            6.93987701e-310])

 %% Cell type:code id: tags:

 ``` python
 # slower than np.empty but the values are all 0.
 np.zeros([2, 6])
 ```

 %% Output

    array([[0., 0., 0., 0., 0., 0.],
           [0., 0., 0., 0., 0., 0.]])

 %% Cell type:code id: tags:

 ``` python
 # multidimensional array
 a = np.ones([2, 3, 4])
 print(a.shape, a.size, a.dtype)
 a
 ```

 %% Output

    (2, 3, 4) 24 float64

    array([[[1., 1., 1., 1.],
            [1., 1., 1., 1.],
            [1., 1., 1., 1.]],
    
           [[1., 1., 1., 1.],
            [1., 1., 1., 1.],
            [1., 1., 1., 1.]]])

 %% Cell type:code id: tags:

 ``` python
 # like range but produce 1D numpy array
 np.arange(4)
 ```

 %% Output

    array([0, 1, 2, 3])

 %% Cell type:code id: tags:

 ``` python
 # np.arange can produce arrays of floats
 np.arange(4.)
 ```

 %% Output

    array([0., 1., 2., 3.])

 %% Cell type:code id: tags:

 ``` python
 # another convenient function to generate 1D arrays
 np.linspace(10, 20, 5)
 ```

 %% Output

    array([10. , 12.5, 15. , 17.5, 20. ])

 %% Cell type:markdown id: tags:

 A NumPy array can be easily converted to a Python list.

 %% Cell type:code id: tags:

 ``` python
 a = np.linspace(10, 20 ,5)
 list(a)
 ```

 %% Output

    [10.0, 12.5, 15.0, 17.5, 20.0]

 %% Cell type:code id: tags:

 ``` python
 # Or even better
 a.tolist()
 ```

 %% Output

    [10.0, 12.5, 15.0, 17.5, 20.0]

 %% Cell type:markdown id: tags:

 ## Step 2 - Access elements

 Elements in a `numpy` array can be accessed using indexing and slicing in any dimension. It also offers the same functionalities available in Fortan or Matlab.

 ### 2.1 - Indexes and slices
 For example, we can create an array `A` and perform any kind of selection operations on it.

 %% Cell type:code id: tags:

 ``` python
 A = np.random.random([4, 5])
 A
 ```

 %% Output

    array([[0.14726334, 0.90799321, 0.67130094, 0.23978162, 0.96444415],
           [0.26039418, 0.06135763, 0.35856793, 0.73366941, 0.50698925],
           [0.39557097, 0.55950866, 0.70056205, 0.65344863, 0.90891062],
           [0.19049184, 0.56355734, 0.71701494, 0.66035889, 0.06400119]])

 %% Cell type:code id: tags:

 ``` python
 # Get the element from second line, first column
 A[1, 0]
 ```

 %% Output

    0.26039417830707656

 %% Cell type:code id: tags:

 ``` python
 # Get the first two lines
 A[:2]
 ```

 %% Output

    array([[0.14726334, 0.90799321, 0.67130094, 0.23978162, 0.96444415],
           [0.26039418, 0.06135763, 0.35856793, 0.73366941, 0.50698925]])

 %% Cell type:code id: tags:

 ``` python
 # Get the last column
 A[:, -1]
 ```

 %% Output

    array([0.96444415, 0.50698925, 0.90891062, 0.06400119])

 %% Cell type:code id: tags:

 ``` python
 # Get the first two lines and the columns with an even index
 A[:2, ::2]
 ```

 %% Output

    array([[0.14726334, 0.67130094, 0.96444415],
           [0.26039418, 0.35856793, 0.50698925]])

 %% Cell type:markdown id: tags:

 ### 2.2 -  Using a mask to select elements validating a condition:

 %% Cell type:code id: tags:

 ``` python
 cond = A > 0.5
 print(cond)
 print(A[cond])
 ```

 %% Output

    [[False  True  True False  True]
     [False False False  True  True]
     [False  True  True  True  True]
     [False  True  True  True False]]
    [0.90799321 0.67130094 0.96444415 0.73366941 0.50698925 0.55950866
     0.70056205 0.65344863 0.90891062 0.56355734 0.71701494 0.66035889]

 %% Cell type:markdown id: tags:

 The mask is in fact a particular case of the advanced indexing capabilities provided by NumPy. For example, it is even possible to use lists for indexing:

 %% Cell type:code id: tags:

 ``` python
 # Selecting only particular columns
 print(A)
 A[:, [0, 1, 4]]
 ```

 %% Output

    [[0.14726334 0.90799321 0.67130094 0.23978162 0.96444415]
     [0.26039418 0.06135763 0.35856793 0.73366941 0.50698925]
     [0.39557097 0.55950866 0.70056205 0.65344863 0.90891062]
     [0.19049184 0.56355734 0.71701494 0.66035889 0.06400119]]

    array([[0.14726334, 0.90799321, 0.96444415],
           [0.26039418, 0.06135763, 0.50698925],
           [0.39557097, 0.55950866, 0.90891062],
           [0.19049184, 0.56355734, 0.06400119]])

 %% Cell type:markdown id: tags:

 ## Step 3 -  Perform array manipulations
 ### 3.1 - Apply arithmetic operations to whole arrays (element-wise):

 %% Cell type:code id: tags:

 ``` python
 (A+5)**2
 ```

 %% Output

    array([[26.49431985, 34.90438372, 32.16365436, 27.45531142, 35.57459401],
           [27.67174691, 25.61734103, 28.71425022, 32.87496493, 30.32693058],
           [29.11218606, 30.9081365 , 32.49640767, 31.96148136, 34.91522467],
           [26.94120557, 30.95317031, 32.68425986, 32.03966276, 25.6441081 ]])

 %% Cell type:markdown id: tags:

 ### 3.2 - Apply functions element-wise:

 %% Cell type:code id: tags:

 ``` python
 np.exp(A) # With numpy arrays, use the functions from numpy !
 ```

 %% Output

    array([[1.15865904, 2.47934201, 1.95678132, 1.27097157, 2.62332907],
           [1.29744141, 1.0632791 , 1.43127825, 2.08270892, 1.66028496],
           [1.48523197, 1.74981253, 2.01488485, 1.92215822, 2.48161763],
           [1.2098445 , 1.75691133, 2.04830976, 1.93548684, 1.06609367]])

 %% Cell type:markdown id: tags:

 ### 3.3 - Setting parts of arrays

 %% Cell type:code id: tags:

 ``` python
 A[:, 0] = 0.
 print(A)
 ```

 %% Output

    [[0.         0.90799321 0.67130094 0.23978162 0.96444415]
     [0.         0.06135763 0.35856793 0.73366941 0.50698925]
     [0.         0.55950866 0.70056205 0.65344863 0.90891062]
     [0.         0.56355734 0.71701494 0.66035889 0.06400119]]

 %% Cell type:code id: tags:

 ``` python
 # BONUS: Safe element-wise inverse with masks
 cond = (A != 0)
 A[cond] = 1./A[cond]
 print(A)
 ```

 %% Output

    [[ 0.          1.10132983  1.48964487  4.17046144  1.03686668]
     [ 0.         16.29789234  2.78887186  1.36301171  1.97242842]
     [ 0.          1.78728245  1.42742531  1.53034219  1.1002182 ]
     [ 0.          1.77444232  1.39467107  1.51432807 15.62470834]]

 %% Cell type:markdown id: tags:

 ## Step 4 - Attributes and methods of `np.ndarray` (see the [doc](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray))

 %% Cell type:code id: tags:

 ``` python
 for i,v in enumerate([s for s in dir(A) if not s.startswith('__')]):
    print(f'{v:16}', end='')
    if (i+1) % 6 == 0 :print('')
 ```

 %% Output

    T               all             any             argmax          argmin          argpartition
    argsort         astype          base            byteswap        choose          clip
    compress        conj            conjugate       copy            ctypes          cumprod
    cumsum          data            diagonal        dot             dtype           dump
    dumps           fill            flags           flat            flatten         getfield
    imag            item            itemset         itemsize        max             mean
    min             nbytes          ndim            newbyteorder    nonzero         partition
    prod            ptp             put             ravel           real            repeat
    reshape         resize          round           searchsorted    setfield        setflags
    shape           size            sort            squeeze         std             strides
    sum             swapaxes        take            tobytes         tofile          tolist
    tostring        trace           transpose       var             view

 %% Cell type:code id: tags:

 ``` python

 # Ex1: Get the mean through different dimensions

 print(A)
 print('Mean value',  A.mean())
 print('Mean line',   A.mean(axis=0))
 print('Mean column', A.mean(axis=1))
 ```

 %% Output

    [[ 0.          1.10132983  1.48964487  4.17046144  1.03686668]
     [ 0.         16.29789234  2.78887186  1.36301171  1.97242842]
     [ 0.          1.78728245  1.42742531  1.53034219  1.1002182 ]
     [ 0.          1.77444232  1.39467107  1.51432807 15.62470834]]
    Mean value 2.818696254398785
    Mean line [0.         5.24023674 1.77515328 2.14453585 4.93355541]
    Mean column [1.55966056 4.48444087 1.16905363 4.06162996]

 %% Cell type:code id: tags:

 ``` python

 # Ex2: Convert a 2D array in 1D keeping all elements

 print(A)
 print(A.shape)
 A_flat = A.flatten()
 print(A_flat, A_flat.shape)
 ```

 %% Output

    [[ 0.          1.10132983  1.48964487  4.17046144  1.03686668]
     [ 0.         16.29789234  2.78887186  1.36301171  1.97242842]
     [ 0.          1.78728245  1.42742531  1.53034219  1.1002182 ]
     [ 0.          1.77444232  1.39467107  1.51432807 15.62470834]]
    (4, 5)
    [ 0.          1.10132983  1.48964487  4.17046144  1.03686668  0.
     16.29789234  2.78887186  1.36301171  1.97242842  0.          1.78728245
      1.42742531  1.53034219  1.1002182   0.          1.77444232  1.39467107
      1.51432807 15.62470834] (20,)

 %% Cell type:markdown id: tags:

 ### 4.1 - Remark: dot product

 %% Cell type:code id: tags:

 ``` python
 b = np.linspace(0, 10, 11)
 c = b @ b
 # before 3.5:
 # c = b.dot(b)
 print(b)
 print(c)
 ```

 %% Output

    [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]
    385.0

 %% Cell type:markdown id: tags:

 ### 4.2 -  For Matlab users

 |     ` `       | Matlab | Numpy |
 | ------------- | ------ | ----- |
 | element wise  |  `.*`  |  `*`  |
 |  dot product  |  `*`   |  `@`  |

 %% Cell type:markdown id: tags:

 `numpy` arrays can also be sorted, even when they are composed of complex data if the type of the columns are explicitly stated with `dtypes`.

 %% Cell type:markdown id: tags:

 ### 4.3 -  NumPy and SciPy sub-packages:

 We already saw `numpy.random` to generate `numpy` arrays filled with random values. This submodule also provides functions related to distributions (Poisson, gaussian, etc.) and permutations.

 %% Cell type:markdown id: tags:

 To perform linear algebra with dense matrices, we can use the submodule `numpy.linalg`. For instance, in order to compute the determinant of a random matrix, we use the method `det`

 %% Cell type:code id: tags:

 ``` python
 A = np.random.random([5,5])
 print(A)
 np.linalg.det(A)
 ```

 %% Output

    [[0.33277412 0.18065847 0.10352574 0.48095553 0.97748505]
     [0.20756676 0.33166777 0.00808192 0.18868636 0.1722338 ]
     [0.94092977 0.21755657 0.52045179 0.45008315 0.1751413 ]
     [0.27404121 0.53531168 0.41209088 0.22503687 0.50026306]
     [0.23077516 0.99886616 0.74286904 0.40849416 0.57970741]]

    -0.026288777656342802

 %% Cell type:code id: tags:

 ``` python
 squared_subA = A[1:3, 1:3]
 print(squared_subA)
 np.linalg.inv(squared_subA)
 ```

 %% Output

    [[0.33166777 0.00808192]
     [0.21755657 0.52045179]]

    array([[ 3.0460928 , -0.04730175],
           [-1.27331197,  1.94118039]])

 %% Cell type:markdown id: tags:

 ### 4.4 -  Introduction to Pandas: Python Data Analysis Library

 Pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for Python.

 [Pandas tutorial](https://pandas.pydata.org/pandas-docs/stable/10min.html)
 [Grenoble Python Working Session](https://github.com/iutzeler/Pres_Pandas/)
-[Pandas for SQL Users](https://hackernoon.com/pandas-cheatsheet-for-sql-people-part-1-2976894acd0)
+[Pandas for SQL Users](http://sergilehkyi.com/translating-sql-to-pandas/)
+[Pandas Introduction Training HPC Python@UGA](https://gricad-gitlab.univ-grenoble-alpes.fr/python-uga/training-hpc/-/blob/master/ipynb/11_pandas.ipynb)

 %% Cell type:markdown id: tags:

 ---
 <img width="80px" src="../fidle/img/00-Fidle-logo-01.svg"></img>