Commit 76162001 authored by paugier's avatar paugier
Browse files

Pres on profiling

parent 528d73a1
...@@ -27,4 +27,5 @@ ipynb/index.html ...@@ -27,4 +27,5 @@ ipynb/index.html
pyfiles/dtw_cort_dist/V5_cython/*.c pyfiles/dtw_cort_dist/V5_cython/*.c
pyfiles/dtw_cort_dist/V5_cython/*.html pyfiles/dtw_cort_dist/V5_cython/*.html
**/V*/res_cort.npy **/V*/res_cort.npy
**/V*/res_dtw.npy **/V*/res_dtw.npy
\ No newline at end of file pyfiles/dtw_cort_dist/V*/prof.png
...@@ -2,103 +2,162 @@ ...@@ -2,103 +2,162 @@
# Profiling # Profiling
Pierre Augier (LEGI), Cyrille Bonamy (LEGI), Eric Maldonado (Irstea), Franck Thollard (ISTerre), Christophe Picard (LJK), Loïc Huder (ISTerre) Pierre Augier (LEGI), Cyrille Bonamy (LEGI), Eric Maldonado (Irstea), Franck Thollard (ISTerre), Christophe Picard (LJK), Loïc Huder (ISTerre)
### Measure ⏱, don't guess! Profile to find the bottlenecks.
%% Cell type:markdown id: tags: <p class="small"><br></p>
### Do not optimize everything!
- *"Premature optimization is the root of all evil"* (Donald Knuth)
- 80 / 20 rule, efficiency important for expensive things and NOT for small things
# Road map %% Cell type:markdown id: tags:
# Different types of profiling
## Time profiling ## Time profiling
- timeit
- script base time (unix cmd) - Small code snippets
- function based profiling (cprofile) - Script based benchmark
- line base profiling - Function based profiling
- Line based profiling
<p class="small"><br></p>
## Memory profiling ## Memory profiling
- further readings
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
timeit ## Small code snippets
------
In ipython, you can use the magic command timeit that execute a piece of code and stats the time it spends:
- There is a module [`timeit` in the standard library](https://docs.python.org/3/library/timeit.html).
%% Cell type:markdown id: tags: `python3 -m timeit -s "import math; l=[]" "for x in range(100): l.append(math.pow(x,2))"`
Basic profiling Problem: the module `timeit` does not try to guess how many times to execute the statement.
-----------------
While writing code, you can use the magic command timeit: - In IPython, you can use the magic command `%timeit` that execute a piece of code and stats the time it spends:
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` python ``` python
import math import math
l=[] l = []
%timeit for x in range(100): l.append(math.pow(x,2)) %timeit for x in range(100): l.append(math.pow(x,2))
%timeit [math.pow(x,2) for x in range(100)] %timeit [math.pow(x,2) for x in range(100)]
l = [] l = []
%timeit for x in range(100): l.append(x*x) %timeit for x in range(100): l.append(x*x)
%timeit [x*x for x in range(100)] %timeit [x*x for x in range(100)]
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Basic profiling - [`pyperf`](https://pypi.org/project/pyperf/) is a more powerful tool but we can also do the same as with the module `timeit`:
-----------------
`python3 -m pyperf timeit -s "import math; l=[]" "for x in range(100): l.append(math.pow(x,2))"`
%% Cell type:markdown id: tags:
## Script base benchmark
Evaluate the time execution of your script as a whole
- Using the Unix command `time`:
`time myscript.py`
- Using the Unix program [`perf`](https://perf.wiki.kernel.org)
`perf myscript.py`
Issues:
- not accurate (only one run!)
- includes the import and initialization time. It can be better to modify the script to print the elapsed time measured with:
%% Cell type:code id: tags:
``` python
from time import time
Evaluate you script as a whole, *e.g.* using the unix time function: l = []
t_start = time()
[math.pow(x,2) for x in range(100)]
print(f"elapsed time: {time() - t_start:.2e} s")
`time myscript intput_data` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Function based profiling (cprofile) ## Function based profiling (cProfile)
-----------------------------------------
cProfile (https://docs.python.org/3.7/library/profile.html): **deterministic profiling** of Python programs.
Use the cProfile module to profile the code. 2 steps: (1) run the profiler and (2) analyze the results.
- Option -s ask to sort using cumulative time 1. Run the profiler
- profile_data.pyprof is the output of the profiling
- myscript intput_data: the script with its regular arguments
**Warning: profiling is much slower than a classical run, so do not profile with a long during setting** - With an already written script `python3 -m cProfile myscript.py`
`python3 -m cProfile -s cumulative -o profile_data.pyprof myscript intput_data` - Much better, write a dedicated script using the module cProfile. See `pyfiles/dtw_cort_dist/V0_numpy_loops/prof.py`
Visualize you result (*e.g.*) using `pyprof2calltree` and `kcachegrind` **Warning: profiling is much slower than a classical run, so do not profile with a long during setting**
`pyprof2calltree -i profile_data.pyprof -k` 2. Analyze the results
The standard tool is `pstats` (https://docs.python.org/3.7/library/profile.html#module-pstats)
Or visualize the results with `gprof2dot`, `SnakeViz`, `pyprof2calltree` and `kcachegrind`
Example: `pyprof2calltree -i prof.pstats -k`
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Line based profiling ## Statistical profiling
-----------------------
See http://pramodkumbhar.com/2019/01/python-deterministic-vs-statistical-profilers/
Advantage compared to deterministic profiling: **very small overhead**
- [pyflame](https://github.com/uber/pyflame)
- [py-spy](https://github.com/benfred/py-spy)
- [plop](https://github.com/bdarnell/plop)
%% Cell type:markdown id: tags:
## Line based profiling
- [line_profiler](https://github.com/rkern/line_profiler)
- [pprofile](https://github.com/vpelletier/pprofile)
%% Cell type:markdown id: tags:
- pprofile ## Memory profiler
- vprof
- [memory_profiler](https://pypi.org/project/memory-profiler/)
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
Memory profiler ## Time and memory profiler
-----------------
- [vprof](https://pypi.org/project/vprof/)
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Further reading # Further reading
More on profiling on the stackoverflow discussion: More on profiling on a stackoverflow discussion:
https://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script https://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script
......
...@@ -23,7 +23,7 @@ def serie_pair_index_generator(number): ...@@ -23,7 +23,7 @@ def serie_pair_index_generator(number):
) )
def DTWDistance(s1, s2): def dtw_distance(s1, s2):
""" Computes the dtw between s1 and s2 with distance the absolute distance """ Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first serie (ie an iterable over floats64) :param s1: the first serie (ie an iterable over floats64)
...@@ -83,7 +83,7 @@ def compute(series, nb_series): ...@@ -83,7 +83,7 @@ def compute(series, nb_series):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64) _dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64) _dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64)
for t1, t2 in gen: for t1, t2 in gen:
dist_dtw = DTWDistance(series[t1], series[t2]) dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[t1, t2] = dist_dtw _dist_mat_dtw[t1, t2] = dist_dtw
_dist_mat_dtw[t2, t1] = dist_dtw _dist_mat_dtw[t2, t1] = dist_dtw
dist_cort = 0.5 * (1 - cort(series[t1], series[t2])) dist_cort = 0.5 * (1 - cort(series[t1], series[t2]))
......
import cProfile
import pstats
from time import time
from dtw_cort_dist_mat import main, compute
series, nb_series = main(only_init=True)
t0 = t0 = time()
a, b = compute(series, nb_series)
t_end = time()
print('\nelapsed time = {:.3f} s'.format(t_end - t0))
t0 = t0 = time()
cProfile.runctx("a, b = compute(series, nb_series)", globals(), locals(), "prof.pstats")
t_end = time()
s = pstats.Stats('prof.pstats')
s.sort_stats('time').print_stats(12)
print('\nelapsed time = {:.3f} s'.format(t_end - t0))
print(
'\nwith gprof2dot and graphviz (command dot):\n'
'gprof2dot -f pstats prof.pstats | dot -Tpng -o prof.png')
...@@ -23,7 +23,7 @@ def serie_pair_index_generator(number): ...@@ -23,7 +23,7 @@ def serie_pair_index_generator(number):
) )
def DTWDistance(s1, s2): def dtw_distance(s1, s2):
""" Computes the dtw between s1 and s2 with distance the absolute distance """ Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first serie (ie an iterable over floats64) :param s1: the first serie (ie an iterable over floats64)
...@@ -79,7 +79,7 @@ def compute(series, nb_series): ...@@ -79,7 +79,7 @@ def compute(series, nb_series):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64) _dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64) _dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64)
for t1, t2 in gen: for t1, t2 in gen:
dist_dtw = DTWDistance(series[t1], series[t2]) dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[t1, t2] = dist_dtw _dist_mat_dtw[t1, t2] = dist_dtw
_dist_mat_dtw[t2, t1] = dist_dtw _dist_mat_dtw[t2, t1] = dist_dtw
dist_cort = 0.5 * (1 - cort(series[t1], series[t2])) dist_cort = 0.5 * (1 - cort(series[t1], series[t2]))
......
...@@ -8,7 +8,7 @@ from libc.math cimport abs ...@@ -8,7 +8,7 @@ from libc.math cimport abs
@cython.boundscheck(False) @cython.boundscheck(False)
@cython.wraparound(False) @cython.wraparound(False)
def DTWDistance(double[:] s1, double[:] s2): def dtw_distance(double[:] s1, double[:] s2):
""" Computes the dtw between s1 and s2 with distance the absolute distance """ Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first serie (ie an iterable over floats64) :param s1: the first serie (ie an iterable over floats64)
......
...@@ -6,7 +6,7 @@ from pathlib import Path ...@@ -6,7 +6,7 @@ from pathlib import Path
import numpy as np import numpy as np
from dtw_cort import cort, DTWDistance from dtw_cort import cort, dtw_distance
util = run_path(Path(__file__).absolute().parent.parent / "util.py") util = run_path(Path(__file__).absolute().parent.parent / "util.py")
...@@ -30,7 +30,7 @@ def compute(series, nb_series): ...@@ -30,7 +30,7 @@ def compute(series, nb_series):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64) _dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64) _dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64)
for t1, t2 in gen: for t1, t2 in gen:
dist_dtw = DTWDistance(series[t1], series[t2]) dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[t1, t2] = dist_dtw _dist_mat_dtw[t1, t2] = dist_dtw
_dist_mat_dtw[t2, t1] = dist_dtw _dist_mat_dtw[t2, t1] = dist_dtw
dist_cort = 0.5 * (1 - cort(series[t1], series[t2])) dist_cort = 0.5 * (1 - cort(series[t1], series[t2]))
......
...@@ -25,7 +25,7 @@ def serie_pair_index_generator(number): ...@@ -25,7 +25,7 @@ def serie_pair_index_generator(number):
) )
def DTWDistance(s1, s2): def dtw_distance(s1, s2):
""" Computes the dtw between s1 and s2 with distance the absolute distance """ Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first serie (ie an iterable over floats64) :param s1: the first serie (ie an iterable over floats64)
...@@ -83,7 +83,7 @@ def compute(series: "float64[:, :]", nb_series: int): ...@@ -83,7 +83,7 @@ def compute(series: "float64[:, :]", nb_series: int):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64) _dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64) _dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64)
for t1, t2 in gen: for t1, t2 in gen:
dist_dtw = DTWDistance(series[t1], series[t2]) dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[t1, t2] = dist_dtw _dist_mat_dtw[t1, t2] = dist_dtw
_dist_mat_dtw[t2, t1] = dist_dtw _dist_mat_dtw[t2, t1] = dist_dtw
dist_cort = 0.5 * (1 - cort(series[t1], series[t2])) dist_cort = 0.5 * (1 - cort(series[t1], series[t2]))
......
...@@ -26,7 +26,7 @@ def serie_pair_index_generator(number): ...@@ -26,7 +26,7 @@ def serie_pair_index_generator(number):
) )
def DTWDistance(s1, s2): def dtw_distance(s1, s2):
""" Computes the dtw between s1 and s2 with distance the absolute distance """ Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first serie (ie an iterable over floats64) :param s1: the first serie (ie an iterable over floats64)
...@@ -84,7 +84,7 @@ def compute(series, nb_series): ...@@ -84,7 +84,7 @@ def compute(series, nb_series):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64) _dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64) _dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64)
for t1, t2 in gen: for t1, t2 in gen:
dist_dtw = DTWDistance(series[t1], series[t2]) dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[t1, t2] = dist_dtw _dist_mat_dtw[t1, t2] = dist_dtw
_dist_mat_dtw[t2, t1] = dist_dtw _dist_mat_dtw[t2, t1] = dist_dtw
dist_cort = 0.5 * (1 - cort(series[t1], series[t2])) dist_cort = 0.5 * (1 - cort(series[t1], series[t2]))
......
...@@ -10,7 +10,7 @@ def serie_pair_index_generator(number): ...@@ -10,7 +10,7 @@ def serie_pair_index_generator(number):
) )
def DTWDistance(s1, s2): def dtw_distance(s1, s2):
" Computes the dtw between s1 and s2 with distance the absolute distance\n\n :param s1: the first serie (ie an iterable over floats64)\n :param s2: the second serie (ie an iterable over floats64)\n :returns: the dtw distance\n :rtype: float64\n " " Computes the dtw between s1 and s2 with distance the absolute distance\n\n :param s1: the first serie (ie an iterable over floats64)\n :param s2: the second serie (ie an iterable over floats64)\n :returns: the dtw distance\n :rtype: float64\n "
len_s1 = len(s1) len_s1 = len(s1)
len_s2 = len(s2) len_s2 = len(s2)
...@@ -58,7 +58,7 @@ def compute(series, nb_series): ...@@ -58,7 +58,7 @@ def compute(series, nb_series):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64) _dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64) _dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64)
for (t1, t2) in gen: for (t1, t2) in gen:
dist_dtw = DTWDistance(series[t1], series[t2]) dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[(t1, t2)] = dist_dtw _dist_mat_dtw[(t1, t2)] = dist_dtw
_dist_mat_dtw[(t2, t1)] = dist_dtw _dist_mat_dtw[(t2, t1)] = dist_dtw
dist_cort = 0.5 * (1 - cort(series[t1], series[t2])) dist_cort = 0.5 * (1 - cort(series[t1], series[t2]))
......
...@@ -26,7 +26,7 @@ def serie_pair_index_generator(number): ...@@ -26,7 +26,7 @@ def serie_pair_index_generator(number):
@jit(cache=True) @jit(cache=True)
def DTWDistance(s1, s2): def dtw_distance(s1, s2):
""" Computes the dtw between s1 and s2 with distance the absolute distance """ Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first serie (ie an iterable over floats64) :param s1: the first serie (ie an iterable over floats64)
...@@ -90,7 +90,7 @@ def compute(series, nb_series): ...@@ -90,7 +90,7 @@ def compute(series, nb_series):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64) _dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64) _dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64)
for t1, t2 in gen: for t1, t2 in gen:
dist_dtw = DTWDistance(series[t1], series[t2]) dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[t1, t2] = dist_dtw _dist_mat_dtw[t1, t2] = dist_dtw
_dist_mat_dtw[t2, t1] = dist_dtw _dist_mat_dtw[t2, t1] = dist_dtw
dist_cort = 0.5 * (1 - cort(series[t1], series[t2])) dist_cort = 0.5 * (1 - cort(series[t1], series[t2]))
......
...@@ -24,7 +24,7 @@ def serie_pair_index_generator(number): ...@@ -24,7 +24,7 @@ def serie_pair_index_generator(number):
) )
def DTWDistance(s1, s2): def dtw_distance(s1, s2):
""" Computes the dtw between s1 and s2 with distance the absolute distance """ Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first serie (ie an iterable over floats64) :param s1: the first serie (ie an iterable over floats64)
...@@ -32,7 +32,7 @@ def DTWDistance(s1, s2): ...@@ -32,7 +32,7 @@ def DTWDistance(s1, s2):
:returns: the dtw distance :returns: the dtw distance
:rtype: float64 :rtype: float64
""" """
dtw_result = dtw_cort.dtwdistance(s1, s2) dtw_result = dtw_cort.dtw_distance(s1, s2)
return dtw_result return dtw_result
...@@ -54,7 +54,7 @@ def compute(series, nb_series): ...@@ -54,7 +54,7 @@ def compute(series, nb_series):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64) _dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64) _dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64)
for t1, t2 in gen: for t1, t2 in gen:
dist_dtw = DTWDistance(series[t1], series[t2]) dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[t1, t2] = dist_dtw _dist_mat_dtw[t1, t2] = dist_dtw
_dist_mat_dtw[t2, t1] = dist_dtw _dist_mat_dtw[t2, t1] = dist_dtw
dist_cort = 0.5 * (1 - cort(series[t1], series[t2])) dist_cort = 0.5 * (1 - cort(series[t1], series[t2]))
......
...@@ -2,7 +2,7 @@ module dtw_cort ...@@ -2,7 +2,7 @@ module dtw_cort
implicit none implicit none
contains contains
subroutine dtwdistance(s1, s2, dtw_result) subroutine dtw_distance(s1, s2, dtw_result)
! Computes the dtw between s1 and s2 with distance the absolute distance ! Computes the dtw between s1 and s2 with distance the absolute distance
doubleprecision, intent(in) :: s1(:), s2(:) doubleprecision, intent(in) :: s1(:), s2(:)
doubleprecision, intent(out) :: dtw_result doubleprecision, intent(out) :: dtw_result
...@@ -41,7 +41,7 @@ module dtw_cort ...@@ -41,7 +41,7 @@ module dtw_cort
dtw_result = dtw_mat(len_s1, len_s2) dtw_result = dtw_mat(len_s1, len_s2)
end subroutine dtwdistance end subroutine dtw_distance
doubleprecision function cort(s1, s2) doubleprecision function cort(s1, s2)
! Computes the cort between s1 and s2 (assuming they have the same length) ! Computes the cort between s1 and s2 (assuming they have the same length)
......
...@@ -33,7 +33,7 @@ def serie_pair_index_generator(number, rank, size): ...@@ -33,7 +33,7 @@ def serie_pair_index_generator(number, rank, size):
yield _idx_greater, _idx_lower yield _idx_greater, _idx_lower
def DTWDistance(s1, s2): def dtw_distance(s1, s2):
""" Computes the dtw between s1 and s2 with distance the absolute distance """ Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first serie (ie an iterable over floats64) :param s1: the first serie (ie an iterable over floats64)
...@@ -41,7 +41,7 @@ def DTWDistance(s1, s2): ...@@ -41,7 +41,7 @@ def DTWDistance(s1, s2):
:returns: the dtw distance :returns: the dtw distance
:rtype: float64 :rtype: float64
""" """
dtw_result = dtw_cort.dtwdistance(s1, s2) dtw_result = dtw_cort.dtw_distance(s1, s2)
return dtw_result return dtw_result
...@@ -64,7 +64,7 @@ def compute(series, nb_series): ...@@ -64,7 +64,7 @@ def compute(series, nb_series):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64) _dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros_like(_dist_mat_dtw) _dist_mat_cort = np.zeros_like(_dist_mat_dtw)
for t1, t2 in gen: for t1, t2 in gen:
dist_dtw = DTWDistance(series[t1], series[t2]) dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[t1, t2] = dist_dtw