Commit 76162001 authored by paugier's avatar paugier
Browse files

Pres on profiling

parent 528d73a1
......@@ -27,4 +27,5 @@ ipynb/index.html
pyfiles/dtw_cort_dist/V5_cython/*.c
pyfiles/dtw_cort_dist/V5_cython/*.html
**/V*/res_cort.npy
**/V*/res_dtw.npy
\ No newline at end of file
**/V*/res_dtw.npy
pyfiles/dtw_cort_dist/V*/prof.png
......@@ -2,103 +2,162 @@
# Profiling
Pierre Augier (LEGI), Cyrille Bonamy (LEGI), Eric Maldonado (Irstea), Franck Thollard (ISTerre), Christophe Picard (LJK), Loïc Huder (ISTerre)
### Measure ⏱, don't guess! Profile to find the bottlenecks.
%% Cell type:markdown id: tags:
<p class="small"><br></p>
### Do not optimize everything!
- *"Premature optimization is the root of all evil"* (Donald Knuth)
- 80 / 20 rule, efficiency important for expensive things and NOT for small things
# Road map
%% Cell type:markdown id: tags:
# Different types of profiling
## Time profiling
- timeit
- script base time (unix cmd)
- function based profiling (cprofile)
- line base profiling
- Small code snippets
- Script based benchmark
- Function based profiling
- Line based profiling
<p class="small"><br></p>
## Memory profiling
- further readings
%% Cell type:markdown id: tags:
timeit
------
In ipython, you can use the magic command timeit that execute a piece of code and stats the time it spends:
## Small code snippets
- There is a module [`timeit` in the standard library](https://docs.python.org/3/library/timeit.html).
%% Cell type:markdown id: tags:
`python3 -m timeit -s "import math; l=[]" "for x in range(100): l.append(math.pow(x,2))"`
Basic profiling
-----------------
Problem: the module `timeit` does not try to guess how many times to execute the statement.
While writing code, you can use the magic command timeit:
- In IPython, you can use the magic command `%timeit` that execute a piece of code and stats the time it spends:
%% Cell type:code id: tags:
``` python
import math
l=[]
l = []
%timeit for x in range(100): l.append(math.pow(x,2))
%timeit [math.pow(x,2) for x in range(100)]
l = []
%timeit for x in range(100): l.append(x*x)
%timeit [x*x for x in range(100)]
```
%% Cell type:markdown id: tags:
Basic profiling
-----------------
- [`pyperf`](https://pypi.org/project/pyperf/) is a more powerful tool but we can also do the same as with the module `timeit`:
`python3 -m pyperf timeit -s "import math; l=[]" "for x in range(100): l.append(math.pow(x,2))"`
%% Cell type:markdown id: tags:
## Script base benchmark
Evaluate the time execution of your script as a whole
- Using the Unix command `time`:
`time myscript.py`
- Using the Unix program [`perf`](https://perf.wiki.kernel.org)
`perf myscript.py`
Issues:
- not accurate (only one run!)
- includes the import and initialization time. It can be better to modify the script to print the elapsed time measured with:
%% Cell type:code id: tags:
``` python
from time import time
Evaluate you script as a whole, *e.g.* using the unix time function:
l = []
t_start = time()
[math.pow(x,2) for x in range(100)]
print(f"elapsed time: {time() - t_start:.2e} s")
`time myscript intput_data`
```
%% Cell type:markdown id: tags:
Function based profiling (cprofile)
-----------------------------------------
## Function based profiling (cProfile)
cProfile (https://docs.python.org/3.7/library/profile.html): **deterministic profiling** of Python programs.
Use the cProfile module to profile the code.
2 steps: (1) run the profiler and (2) analyze the results.
- Option -s ask to sort using cumulative time
- profile_data.pyprof is the output of the profiling
- myscript intput_data: the script with its regular arguments
1. Run the profiler
**Warning: profiling is much slower than a classical run, so do not profile with a long during setting**
- With an already written script `python3 -m cProfile myscript.py`
`python3 -m cProfile -s cumulative -o profile_data.pyprof myscript intput_data`
- Much better, write a dedicated script using the module cProfile. See `pyfiles/dtw_cort_dist/V0_numpy_loops/prof.py`
Visualize you result (*e.g.*) using `pyprof2calltree` and `kcachegrind`
**Warning: profiling is much slower than a classical run, so do not profile with a long during setting**
`pyprof2calltree -i profile_data.pyprof -k`
2. Analyze the results
The standard tool is `pstats` (https://docs.python.org/3.7/library/profile.html#module-pstats)
Or visualize the results with `gprof2dot`, `SnakeViz`, `pyprof2calltree` and `kcachegrind`
Example: `pyprof2calltree -i prof.pstats -k`
%% Cell type:markdown id: tags:
Line based profiling
-----------------------
## Statistical profiling
See http://pramodkumbhar.com/2019/01/python-deterministic-vs-statistical-profilers/
Advantage compared to deterministic profiling: **very small overhead**
- [pyflame](https://github.com/uber/pyflame)
- [py-spy](https://github.com/benfred/py-spy)
- [plop](https://github.com/bdarnell/plop)
%% Cell type:markdown id: tags:
## Line based profiling
- [line_profiler](https://github.com/rkern/line_profiler)
- [pprofile](https://github.com/vpelletier/pprofile)
%% Cell type:markdown id: tags:
- pprofile
- vprof
## Memory profiler
- [memory_profiler](https://pypi.org/project/memory-profiler/)
%% Cell type:markdown id: tags:
Memory profiler
-----------------
## Time and memory profiler
- [vprof](https://pypi.org/project/vprof/)
%% Cell type:markdown id: tags:
# Further reading
More on profiling on the stackoverflow discussion:
More on profiling on a stackoverflow discussion:
https://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script
......
......@@ -23,7 +23,7 @@ def serie_pair_index_generator(number):
)
def DTWDistance(s1, s2):
def dtw_distance(s1, s2):
""" Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first serie (ie an iterable over floats64)
......@@ -83,7 +83,7 @@ def compute(series, nb_series):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64)
for t1, t2 in gen:
dist_dtw = DTWDistance(series[t1], series[t2])
dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[t1, t2] = dist_dtw
_dist_mat_dtw[t2, t1] = dist_dtw
dist_cort = 0.5 * (1 - cort(series[t1], series[t2]))
......
import cProfile
import pstats
from time import time
from dtw_cort_dist_mat import main, compute
series, nb_series = main(only_init=True)
t0 = t0 = time()
a, b = compute(series, nb_series)
t_end = time()
print('\nelapsed time = {:.3f} s'.format(t_end - t0))
t0 = t0 = time()
cProfile.runctx("a, b = compute(series, nb_series)", globals(), locals(), "prof.pstats")
t_end = time()
s = pstats.Stats('prof.pstats')
s.sort_stats('time').print_stats(12)
print('\nelapsed time = {:.3f} s'.format(t_end - t0))
print(
'\nwith gprof2dot and graphviz (command dot):\n'
'gprof2dot -f pstats prof.pstats | dot -Tpng -o prof.png')
......@@ -23,7 +23,7 @@ def serie_pair_index_generator(number):
)
def DTWDistance(s1, s2):
def dtw_distance(s1, s2):
""" Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first serie (ie an iterable over floats64)
......@@ -79,7 +79,7 @@ def compute(series, nb_series):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64)
for t1, t2 in gen:
dist_dtw = DTWDistance(series[t1], series[t2])
dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[t1, t2] = dist_dtw
_dist_mat_dtw[t2, t1] = dist_dtw
dist_cort = 0.5 * (1 - cort(series[t1], series[t2]))
......
......@@ -8,7 +8,7 @@ from libc.math cimport abs
@cython.boundscheck(False)
@cython.wraparound(False)
def DTWDistance(double[:] s1, double[:] s2):
def dtw_distance(double[:] s1, double[:] s2):
""" Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first serie (ie an iterable over floats64)
......
......@@ -6,7 +6,7 @@ from pathlib import Path
import numpy as np
from dtw_cort import cort, DTWDistance
from dtw_cort import cort, dtw_distance
util = run_path(Path(__file__).absolute().parent.parent / "util.py")
......@@ -30,7 +30,7 @@ def compute(series, nb_series):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64)
for t1, t2 in gen:
dist_dtw = DTWDistance(series[t1], series[t2])
dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[t1, t2] = dist_dtw
_dist_mat_dtw[t2, t1] = dist_dtw
dist_cort = 0.5 * (1 - cort(series[t1], series[t2]))
......
......@@ -25,7 +25,7 @@ def serie_pair_index_generator(number):
)
def DTWDistance(s1, s2):
def dtw_distance(s1, s2):
""" Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first serie (ie an iterable over floats64)
......@@ -83,7 +83,7 @@ def compute(series: "float64[:, :]", nb_series: int):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64)
for t1, t2 in gen:
dist_dtw = DTWDistance(series[t1], series[t2])
dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[t1, t2] = dist_dtw
_dist_mat_dtw[t2, t1] = dist_dtw
dist_cort = 0.5 * (1 - cort(series[t1], series[t2]))
......
......@@ -26,7 +26,7 @@ def serie_pair_index_generator(number):
)
def DTWDistance(s1, s2):
def dtw_distance(s1, s2):
""" Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first serie (ie an iterable over floats64)
......@@ -84,7 +84,7 @@ def compute(series, nb_series):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64)
for t1, t2 in gen:
dist_dtw = DTWDistance(series[t1], series[t2])
dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[t1, t2] = dist_dtw
_dist_mat_dtw[t2, t1] = dist_dtw
dist_cort = 0.5 * (1 - cort(series[t1], series[t2]))
......
......@@ -10,7 +10,7 @@ def serie_pair_index_generator(number):
)
def DTWDistance(s1, s2):
def dtw_distance(s1, s2):
" Computes the dtw between s1 and s2 with distance the absolute distance\n\n :param s1: the first serie (ie an iterable over floats64)\n :param s2: the second serie (ie an iterable over floats64)\n :returns: the dtw distance\n :rtype: float64\n "
len_s1 = len(s1)
len_s2 = len(s2)
......@@ -58,7 +58,7 @@ def compute(series, nb_series):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64)
for (t1, t2) in gen:
dist_dtw = DTWDistance(series[t1], series[t2])
dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[(t1, t2)] = dist_dtw
_dist_mat_dtw[(t2, t1)] = dist_dtw
dist_cort = 0.5 * (1 - cort(series[t1], series[t2]))
......
......@@ -26,7 +26,7 @@ def serie_pair_index_generator(number):
@jit(cache=True)
def DTWDistance(s1, s2):
def dtw_distance(s1, s2):
""" Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first serie (ie an iterable over floats64)
......@@ -90,7 +90,7 @@ def compute(series, nb_series):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64)
for t1, t2 in gen:
dist_dtw = DTWDistance(series[t1], series[t2])
dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[t1, t2] = dist_dtw
_dist_mat_dtw[t2, t1] = dist_dtw
dist_cort = 0.5 * (1 - cort(series[t1], series[t2]))
......
......@@ -24,7 +24,7 @@ def serie_pair_index_generator(number):
)
def DTWDistance(s1, s2):
def dtw_distance(s1, s2):
""" Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first serie (ie an iterable over floats64)
......@@ -32,7 +32,7 @@ def DTWDistance(s1, s2):
:returns: the dtw distance
:rtype: float64
"""
dtw_result = dtw_cort.dtwdistance(s1, s2)
dtw_result = dtw_cort.dtw_distance(s1, s2)
return dtw_result
......@@ -54,7 +54,7 @@ def compute(series, nb_series):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64)
for t1, t2 in gen:
dist_dtw = DTWDistance(series[t1], series[t2])
dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[t1, t2] = dist_dtw
_dist_mat_dtw[t2, t1] = dist_dtw
dist_cort = 0.5 * (1 - cort(series[t1], series[t2]))
......
......@@ -2,7 +2,7 @@ module dtw_cort
implicit none
contains
subroutine dtwdistance(s1, s2, dtw_result)
subroutine dtw_distance(s1, s2, dtw_result)
! Computes the dtw between s1 and s2 with distance the absolute distance
doubleprecision, intent(in) :: s1(:), s2(:)
doubleprecision, intent(out) :: dtw_result
......@@ -41,7 +41,7 @@ module dtw_cort
dtw_result = dtw_mat(len_s1, len_s2)
end subroutine dtwdistance
end subroutine dtw_distance
doubleprecision function cort(s1, s2)
! Computes the cort between s1 and s2 (assuming they have the same length)
......
......@@ -33,7 +33,7 @@ def serie_pair_index_generator(number, rank, size):
yield _idx_greater, _idx_lower
def DTWDistance(s1, s2):
def dtw_distance(s1, s2):
""" Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first serie (ie an iterable over floats64)
......@@ -41,7 +41,7 @@ def DTWDistance(s1, s2):
:returns: the dtw distance
:rtype: float64
"""
dtw_result = dtw_cort.dtwdistance(s1, s2)
dtw_result = dtw_cort.dtw_distance(s1, s2)
return dtw_result
......@@ -64,7 +64,7 @@ def compute(series, nb_series):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros_like(_dist_mat_dtw)
for t1, t2 in gen:
dist_dtw = DTWDistance(series[t1], series[t2])
dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[t1, t2] = dist_dtw
_dist_mat_dtw[t2, t1] = dist_dtw
dist_cort = 0.5 * (1 - cort(series[t1], series[t2]))
......
......@@ -2,7 +2,7 @@ module dtw_cort
implicit none
contains
subroutine dtwdistance(s1, s2, dtw_result)
subroutine dtw_distance(s1, s2, dtw_result)
! Computes the dtw between s1 and s2 with distance the absolute distance
doubleprecision, intent(in) :: s1(:), s2(:)
doubleprecision, intent(out) :: dtw_result
......@@ -41,7 +41,7 @@ module dtw_cort
dtw_result = dtw_mat(len_s1, len_s2)
end subroutine dtwdistance
end subroutine dtw_distance
doubleprecision function cort(s1, s2)
! Computes the cort between s1 and s2 (assuming they have the same length)
......
......@@ -26,7 +26,7 @@ def serie_pair_index_generator(number):
)
def DTWDistance(s1, s2):
def dtw_distance(s1, s2):
""" Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first serie (ie an iterable over floats64)
......@@ -34,7 +34,7 @@ def DTWDistance(s1, s2):
:returns: the dtw distance
:rtype: float64
"""
dtw_result = dtw_cort.dtwdistance(s1, s2)
dtw_result = dtw_cort.dtw_distance(s1, s2)
return dtw_result
......@@ -60,7 +60,7 @@ def distances(series, idx_s1, idx_s2):
:idx_s2: index of second serie in series
:result: (tuple) idx_s1, idx_s2, dtw and cort between series[s1] and series[s2]
"""
dist_dtw = DTWDistance(series[idx_s1], series[idx_s2])
dist_dtw = dtw_distance(series[idx_s1], series[idx_s2])
dist_cort = 0.5 * (1 - cort(series[idx_s1], series[idx_s2]))
return idx_s1, idx_s2, dist_dtw, dist_cort
......
......@@ -2,7 +2,7 @@ module dtw_cort
implicit none
contains
subroutine dtwdistance(s1, s2, dtw_result)
subroutine dtw_distance(s1, s2, dtw_result)
! Computes the dtw between s1 and s2 with distance the absolute distance
doubleprecision, intent(in) :: s1(:), s2(:)
doubleprecision, intent(out) :: dtw_result
......@@ -41,7 +41,7 @@ module dtw_cort
dtw_result = dtw_mat(len_s1, len_s2)
end subroutine dtwdistance
end subroutine dtw_distance
doubleprecision function cort(s1, s2)
! Computes the cort between s1 and s2 (assuming they have the same length)
......
......@@ -24,7 +24,7 @@ def serie_pair_index_generator(number):
)
def DTWDistance(s1, s2):
def dtw_distance(s1, s2):
""" Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first serie (ie an iterable over floats64)
......@@ -32,7 +32,7 @@ def DTWDistance(s1, s2):
:returns: the dtw distance
:rtype: float64
"""
dtw_result = dtw_cort.dtwdistance(s1, s2)
dtw_result = dtw_cort.dtw_distance(s1, s2)
return dtw_result
......@@ -54,7 +54,7 @@ def compute(series, nb_series):
_dist_mat_dtw = np.zeros((nb_series, nb_series), dtype=np.float64)
_dist_mat_cort = np.zeros((nb_series, nb_series), dtype=np.float64)
for t1, t2 in gen:
dist_dtw = DTWDistance(series[t1], series[t2])
dist_dtw = dtw_distance(series[t1], series[t2])
_dist_mat_dtw[t1, t2] = dist_dtw
_dist_mat_dtw[t2, t1] = dist_dtw
dist_cort = 0.5 * (1 - cort(series[t1], series[t2]))
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment