Commit e8b31b25 authored by paugier's avatar paugier
Browse files

profiling again

parent 86278f5d
Pipeline #76068 passed with stage
in 50 seconds
......@@ -151,10 +151,10 @@
"name": "stdout",
"output_type": "stream",
"text": [
"18.7 µs ± 700 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n",
"13.5 µs ± 566 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n",
"7.27 µs ± 38.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n",
"3.74 µs ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
"28.6 µs ± 1.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n",
"19.6 µs ± 283 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n",
"13 µs ± 462 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n",
"7.45 µs ± 239 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
]
}
],
......@@ -199,12 +199,12 @@
"output_type": "stream",
"text": [
"len(s) = 7160000, nbkeys 33 base, count, count_count, except, colection.counter\n",
"517 ms ± 2.26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"478 ms ± 3.98 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"196 ms ± 680 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n",
"464 ms ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"247 ms ± 843 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"414 ms ± 3.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
"1.08 s ± 82.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"1.09 s ± 42.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"338 ms ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"1.01 s ± 50.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"458 ms ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"940 ms ± 62.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
......@@ -271,12 +271,12 @@
"text": [
"with split\n",
"len(s) = 1100000, nbkeys 90 base, count, count_count, except, colection.counter\n",
"113 ms ± 4.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n",
"104 ms ± 2.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n",
"982 ms ± 2.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"93.2 ms ± 347 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n",
"59.9 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n",
"410 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
"320 ms ± 64.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"201 ms ± 5.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n",
"2.03 s ± 55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"172 ms ± 5.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n",
"101 ms ± 2.71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n",
"904 ms ± 56.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
......@@ -341,18 +341,18 @@
"name": "stdout",
"output_type": "stream",
"text": [
"elapsed time: 9.47e-05 s\n"
"elapsed time: 3.23e-04 s\n"
]
}
],
"source": [
"from time import time\n",
"from time import perf_counter\n",
"\n",
"l = []\n",
"\n",
"t_start = time()\n",
"t_start = perf_counter()\n",
"[math.pow(x,2) for x in range(100)]\n",
"print(f\"elapsed time: {time() - t_start:.2e} s\")\n"
"print(f\"elapsed time: {perf_counter() - t_start:.2e} s\")\n"
]
},
{
......@@ -480,9 +480,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}
%% Cell type:markdown id: tags:
# Processing time evaluation
Pierre Augier (LEGI), Raphaël Bacher (Gipsa), Cyrille Bonamy (LEGI), Eric Maldonado (Irstea), Franck Thollard (ISTerre), Loïc Huder (ISTerre)
%% Cell type:markdown id: tags:
### Measure ⏱, don't guess! Profile to find the bottlenecks.
### Do not optimize everything!
- *"Premature optimization is the root of all evil"* (Donald Knuth)
- 80 / 20 rule, efficiency important for expensive things and NOT for small things
%% Cell type:markdown id: tags:
## Context: some notes on developing efficient software
Given a problem (e.g. finding which element is missing in a list of elements), we want to code a solution that solves the problem. The classical workflow is:
* **design** an algorithm that solves the problem.
- study the input of the algorithm (is it special in one sens?)
- design an algorithms comes (given the specificity of your data)
- choose the adequate data structure, i.e. the data structure that will optimize the relevant operations, a.k.a the operations that takes time or that are repeated a large number of time.
* **evaluate** the complexity of the algorithm (theoretical point of view)
* **take care** of the special cases (can my list be empty, in such a case what is my strategy ?)
* **write your specs**: for example if the list is empty, we raise an exception.
* **write some tests** to check your implementation is correct
* **code**
* **profile**: find where are the bottlenecks
* **code**
* **profile**
* ...
%% Cell type:markdown id: tags:
## Note 1
If your data is large enough, a basic implementation of an algorithm with low complexity will run faster than a fined tuned implementation of a algorithm with high complexity
### Example
Looking for the missing element problem: we know that all the element for 0 to N should be present. We can compute the sum and calculate the difference between the computed sum and the mathematical sum. This algorithm access only once each element. It thus has an $O(N)$ complexity, where N is the number of elements.
An algorithm that checks if element e belongs to the list, for each e will has an $O(N^2)$ complexity and will be slower that the previous one for **sufficient large value of N**.
%% Cell type:markdown id: tags:
## Note 2
If your data has some specificity, take advantage of it.
### Example
- if your list is sorted, solving the above problem can be done by checking that two consecutive elements in the list are consective numbers. The complexity is thus $O(N)$
- sorting N elements can be done in $O(N)$ in the special case where the $N$ items belongs to a range of size N.
%% Cell type:markdown id: tags:
## Note 3
Complexity analysis is done over the **worst case**, what is the worst input for our algorithm
### Example
Sorting elements:
worst case = elements are already sorted but in reverse order
%% Cell type:markdown id: tags:
# Different types of profiling
## Time profiling
- Small code snippets
- Script based benchmark
- Function based profiling
- Line based profiling
<p class="small"><br></p>
## Memory profiling
%% Cell type:markdown id: tags:
## Small code snippets
- There is a module [`timeit` in the standard library](https://docs.python.org/3/library/timeit.html).
`python3 -m timeit -s "import math; l=[]" "for x in range(100): l.append(math.pow(x,2))"`
Problem: the module `timeit` does not try to guess how many times to execute the statement.
- In IPython, you can use the magic command `%timeit` that execute a piece of code and stats the time it spends:
%% Cell type:code id: tags:
``` python
import math
l = []
%timeit for x in range(100): l.append(math.pow(x,2))
%timeit [math.pow(x,2) for x in range(100)]
l = []
%timeit for x in range(100): l.append(x*x)
%timeit [x*x for x in range(100)]
```
%%%% Output: stream
18.7 µs ± 700 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
13.5 µs ± 566 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
7.27 µs ± 38.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
3.74 µs ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
28.6 µs ± 1.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
19.6 µs ± 283 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
13 µs ± 462 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
7.45 µs ± 239 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%% Cell type:markdown id: tags:
- [`pyperf`](https://pypi.org/project/pyperf/) is a more powerful tool but we can also do the same as with the module `timeit`:
`python3 -m pyperf timeit -s "import math; l=[]" "for x in range(100): l.append(math.pow(x,2))"`
%% Cell type:markdown id: tags:
## Do not guess (the return of word counting problem)
%% Cell type:code id: tags:
``` python
def build_count_base(t):
d = {}
for s in t:
if s in d:
d[s] += 1
else:
d[s] = 1
return d
def build_count_set(t):
d = {k:0 for k in set(t)}
for s in t:
d[s] += 1
return d
def build_count_count(t):
d = {k:t.count(k) for k in set(t)}
return d
def build_count_excpt(t):
d = {}
for s in t:
try:
d[s] += 1
except:
d[s] = 1
return d
import collections
def build_count_counter(t):
return collections.Counter(t)
def build_count_defaultdict(t):
d = collections.defaultdict(int)
for k in s:
d[k] += 1
return d
s = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam tristique at velit in varius. Cras ut ultricies orci. Fusce vel consequat ante, vitae luctus tortor. Sed condimentum faucibus enim, sit amet pulvinar ligula feugiat ac. Sed interdum id risus id rhoncus. Nullam nisi justo, ultrices eu est nec, hendrerit maximus lorem. Nam urna eros, accumsan nec magna eu, elementum semper diam. Nulla tempus, nibh id elementum dapibus, ex diam lacinia est, sit amet suscipit nulla nibh eu sapien. Aliquam orci enim, malesuada in facilisis vitae, pharetra sit amet mi. Pellentesque mi tortor, sagittis quis odio quis, fermentum faucibus ex. Aenean sagittis nisl orci. Maecenas tristique velit sed leo facilisis porttitor. "
s = s*10000
len(s)
print(f"len(s) = {len(s)}, nbkeys {len(set(s))} base, count, count_count, except, colection.counter")
%timeit build_count_base(s)
%timeit build_count_set(s)
%timeit build_count_count(s)
%timeit build_count_excpt(s)
%timeit build_count_counter(s)
%timeit build_count_defaultdict(s)
```
%%%% Output: stream
len(s) = 7160000, nbkeys 33 base, count, count_count, except, colection.counter
517 ms ± 2.26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
478 ms ± 3.98 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
196 ms ± 680 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
464 ms ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
247 ms ± 843 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
414 ms ± 3.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.08 s ± 82.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.09 s ± 42.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
338 ms ± 12.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.01 s ± 50.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
458 ms ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
940 ms ± 62.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%% Cell type:code id: tags:
``` python
print("with split")
s2 = s.split()
print(f"len(s) = {len(s2)}, nbkeys {len(set(s2))} base, count, count_count, except, colection.counter")
%timeit build_count_base(s2)
%timeit build_count_set(s2)
%timeit build_count_count(s2)
%timeit build_count_excpt(s2)
%timeit build_count_counter(s2)
%timeit build_count_defaultdict(s2)
```
%%%% Output: stream
with split
len(s) = 1100000, nbkeys 90 base, count, count_count, except, colection.counter
113 ms ± 4.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
104 ms ± 2.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
982 ms ± 2.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
93.2 ms ± 347 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
59.9 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
410 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
320 ms ± 64.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
201 ms ± 5.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
2.03 s ± 55 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
172 ms ± 5.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
101 ms ± 2.71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
904 ms ± 56.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%% Cell type:markdown id: tags:
### Conclusion of these measurements
The best performing algorithm depends on the feature of the input.
In our case: how many different words do we have in our vocabulary?
- 4 for ADN,
- 26 for letters,
- 60 Millions for natural texts
%% Cell type:markdown id: tags:
## Script base benchmark
Evaluate the time execution of your script as a whole
- Using the Unix command `time`:
`time myscript.py`
- Using the Unix program [`perf`](https://perf.wiki.kernel.org)
`perf myscript.py`
Issues:
- not accurate (only one run!)
- includes the import and initialization time. It can be better to modify the script to print the elapsed time measured with:
%% Cell type:code id: tags:
``` python
from time import time
from time import perf_counter
l = []
t_start = time()
t_start = perf_counter()
[math.pow(x,2) for x in range(100)]
print(f"elapsed time: {time() - t_start:.2e} s")
print(f"elapsed time: {perf_counter() - t_start:.2e} s")
```
%%%% Output: stream
elapsed time: 9.47e-05 s
elapsed time: 3.23e-04 s
%% Cell type:markdown id: tags:
## Function based profiling (cProfile)
cProfile (https://docs.python.org/3.7/library/profile.html): **deterministic profiling** of Python programs.
2 steps: (1) run the profiler and (2) analyze the results.
1. Run the profiler
- With an already written script `python3 -m cProfile myscript.py`
- Much better, write a dedicated script using the module cProfile. See `pyfiles/dtw_cort_dist/V0_numpy_loops/prof.py`
**Warning: profiling is much slower than a classical run, so do not profile with a long during setting**
2. Analyze the results
The standard tool is `pstats` (https://docs.python.org/3.7/library/profile.html#module-pstats)
Or visualize the results with `gprof2dot`, `SnakeViz`, `pyprof2calltree` and `kcachegrind`
Example: `pyprof2calltree -i prof.pstats -k`
%% Cell type:markdown id: tags:
## Statistical profiling
See http://pramodkumbhar.com/2019/01/python-deterministic-vs-statistical-profilers/
Advantage compared to deterministic profiling: **very small overhead**
- [pyflame](https://github.com/uber/pyflame)
- [py-spy](https://github.com/benfred/py-spy)
- [plop](https://github.com/bdarnell/plop)
%% Cell type:markdown id: tags:
## Line based profiling
- [line_profiler](https://github.com/rkern/line_profiler)
- [pprofile](https://github.com/vpelletier/pprofile)
%% Cell type:markdown id: tags:
## Memory profiler
- [memory_profiler](https://pypi.org/project/memory-profiler/)
%% Cell type:markdown id: tags:
## Time and memory profiler
- [vprof](https://pypi.org/project/vprof/)
%% Cell type:markdown id: tags:
# Further reading
More on profiling on a stackoverflow discussion:
https://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment