"Given a problem (*e.g* finding which element is missing in a listof elements), we want to code a solution that solves the problem. The classical workflow is: \n",
"\n",
" * **design** an algorithm that solves the problem.\n",
" - study the input of the algorithm (it is special in one sens -- *e.g* the list we are given is almost sorted)\n",
" - design an algorithms comes (given the specificity of your data)\n",
" - choose the adequate data structure, *i.e.* the data structure that will optimize the relevant operations, a.k.a the operations that takes time or that are repeated a large number of time. \n",
" - check the adequation of\n",
" * **evaluate** the complexity of the algorithm (theoretical point of view)\n",
" * **take care** of the special cases (can my list be empty, in such a case what is my strategy ?)\n",
" * **write your specs**: *e.g.* if the list is empty, we raise an exception.\n",
" * **write some tests** to check your implementation is correct\n",
" * **code** *i.e.* do the implementation of both code and test\n",
" * **profile**: check where are the bottleneck \n",
" * **code** the implementation,\n",
" * **profile**\n",
" * ...."
"### Measure ⏱, don't guess! Profile to find the bottlenecks.\n",
"\n",
"\n",
"### Do not optimize everything!\n",
"\n",
"- *\"Premature optimization is the root of all evil\"* (Donald Knuth)\n",
"\n",
"- 80 / 20 rule, efficiency important for expensive things and NOT for small things"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Context: some notes on developing efficient software\n",
"\n",
"Given a problem (e.g. finding which element is missing in a list of elements), we want to code a solution that solves the problem. The classical workflow is: \n",
"\n",
"* **design** an algorithm that solves the problem.\n",
" - study the input of the algorithm (is it special in one sens?)\n",
" - design an algorithms comes (given the specificity of your data)\n",
" - choose the adequate data structure, i.e. the data structure that will optimize the relevant operations, a.k.a the operations that takes time or that are repeated a large number of time. \n",
"\n",
"* **evaluate** the complexity of the algorithm (theoretical point of view)\n",
"* **take care** of the special cases (can my list be empty, in such a case what is my strategy ?)\n",
"* **write your specs**: for example if the list is empty, we raise an exception.\n",
"* **write some tests** to check your implementation is correct\n",
"* **code**\n",
"* **profile**: find where are the bottlenecks\n",
"* **code**\n",
"* **profile**\n",
"* ..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note 1:\n",
"----------\n",
"## Note 1\n",
" \n",
"If your data is large enough, a basic implementation of an algorithm with low complexity will run faster than a fined tuned implementation of a algorithm with high complexity\n",
"If your data is large enough, a basic implementation of an algorithm with low complexity will run faster than a fined tuned implementation of a algorithm with high complexity\n",
" \n",
"Example:\n",
"--------------\n",
"### Example\n",
"\n",
"Looking for the missing element problem: we know that all the element for 0 to N should be present. We can compute the sum and calculate the difference between the computed sum and the mathematical sum. This algorithm acces only once each element. It thus has an $O(N)$ complexity, where N is the number of elements. \n",
"Looking for the missing element problem: we know that all the element for 0 to N should be present. We can compute the sum and calculate the difference between the computed sum and the mathematical sum. This algorithm access only once each element. It thus has an $O(N)$ complexity, where N is the number of elements. \n",
"\n",
"An algorithm that checks if element e belongs to the list, for each e will has an $O(N^2)$ complexity and will be slower that the previous one for **sufficient large value of N**. \n"
]
...
...
@@ -56,50 +74,31 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Note 2:\n",
"----------\n",
"## Note 2\n",
" \n",
"If your data has some specificity, take advantage of it. \n",
"If your data has some specificity, take advantage of it. \n",
" \n",
" \n",
"Example:\n",
"--------------\n",
"### Example\n",
"\n",
"- if your list is sorted, solving the above problem can be done by checking that two consecutive elements in the list are consective numbers. The complexity is thus $O(N)$\n",
"- sorting N elements can be done in $O(N)$ in the special case where the $N$ items belongs to a range of size N."
"- if your list is sorted, solving the above problem can be done by checking that two consecutive elements in the list are consective numbers. The complexity is thus $O(N)$\n",
"- sorting N elements can be done in $O(N)$ in the special case where the $N$ items belongs to a range of size N."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note 3:\n",
"----------\n",
"## Note 3\n",
" \n",
"Complexity analysis is done over the **worst case**, what is the worst input for our algorithm\n",
"Complexity analysis is done over the **worst case**, what is the worst input for our algorithm\n",
" \n",
"Example:\n",
"--------------\n",
"\n",
" - sorting elements\n",
" - worst case: elements are already sorted but in reverse order\n",
" \n",
" - counting\n",
" - all counts are 1 .... "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Measure ⏱, don't guess! Profile to find the bottlenecks.\n",
"\n",
"\n",
"### Do not optimize everything!\n",
"### Example\n",
"\n",
"- *\"Premature optimization is the root of all evil\"* (Donald Knuth)\n",
"Sorting elements: \n",
"\n",
"- 80 / 20 rule, efficiency important for expensive things and NOT for small things"
"worst case = elements are already sorted but in reverse order\n",
" "
]
},
{
...
...
@@ -145,9 +144,20 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"18.7 µs ± 700 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n",
"13.5 µs ± 566 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n",
"7.27 µs ± 38.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n",
"3.74 µs ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
]
}
],
"source": [
"import math\n",
"l = [] \n",
...
...
@@ -161,6 +171,19 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- [`pyperf`](https://pypi.org/project/pyperf/) is a more powerful tool but we can also do the same as with the module `timeit`:\n",
"\n",
"`python3 -m pyperf timeit -s \"import math; l=[]\" \"for x in range(100): l.append(math.pow(x,2))\"`"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Do not guess (the return of word counting problem)\n",
"The best performing algorithm depends upon the feature of the input.\n",
"The best performing algorithm depends on the feature of the input.\n",
"\n",
"In our case: how many different words do we have in our vocabulary: \n",
"In our case: how many different words do we have in our vocabulary?\n",
" - 4 for ADN, \n",
" - 26 for letters, \n",
" - 60 Millions for natural texts"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- [`pyperf`](https://pypi.org/project/pyperf/) is a more powerful tool but we can also do the same as with the module `timeit`:\n",
"\n",
"`python3 -m pyperf timeit -s \"import math; l=[]\" \"for x in range(100): l.append(math.pow(x,2))\"`"
]
},
{
"cell_type": "markdown",
"metadata": {
...
...
@@ -284,9 +334,17 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 4,
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"elapsed time: 9.47e-05 s\n"
]
}
],
"source": [
"from time import time\n",
"\n",
...
...
@@ -422,7 +480,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
"version": "3.7.2"
}
},
"nbformat": 4,
...
...
%% Cell type:markdown id: tags:
# Processing time evaluation
Pierre Augier (LEGI), Raphaël Bacher (Gipsa), Cyrille Bonamy (LEGI), Eric Maldonado (Irstea), Franck Thollard (ISTerre),Loïc Huder (ISTerre)
Pierre Augier (LEGI), Raphaël Bacher (Gipsa), Cyrille Bonamy (LEGI), Eric Maldonado (Irstea), Franck Thollard (ISTerre),Loïc Huder (ISTerre)
%% Cell type:markdown id: tags:
Given a problem (*e.g* finding which element is missing in a listof elements), we want to code a solution that solves the problem. The classical workflow is:
### Measure ⏱, don't guess! Profile to find the bottlenecks.
***design** an algorithm that solves the problem.
- study the input of the algorithm (it is special in one sens -- *e.g* the list we are given is almost sorted)
- design an algorithms comes (given the specificity of your data)
- choose the adequate data structure, *i.e.* the data structure that will optimize the relevant operations, a.k.a the operations that takes time or that are repeated a large number of time.
- check the adequation of
***evaluate** the complexity of the algorithm (theoretical point of view)
***take care** of the special cases (can my list be empty, in such a case what is my strategy ?)
***write your specs**: *e.g.* if the list is empty, we raise an exception.
***write some tests** to check your implementation is correct
***code***i.e.* do the implementation of both code and test
***profile**: check where are the bottleneck
***code** the implementation,
***profile**
* ....
%% Cell type:markdown id: tags:
### Do not optimize everything!
-*"Premature optimization is the root of all evil"* (Donald Knuth)
Note 1:
----------
- 80 / 20 rule, efficiency important for expensive things and NOT for small things
If your data is large enough, a basic implementation of an algorithm with low complexity will run faster than a fined tuned implementation of a algorithm with high complexity
%% Cell type:markdown id: tags:
Example:
--------------
## Context: some notes on developing efficient software
Looking for the missing element problem: we know that all the element for 0 to N should be present. We can compute the sum and calculate the difference between the computed sum and the mathematical sum. This algorithm acces only once each element. It thus has an $O(N)$ complexity, where N is the number of elements.
Given a problem (e.g. finding which element is missing in a list of elements), we want to code a solution that solves the problem. The classical workflow is:
An algorithm that checks if element e belongs to the list, for each e will has an $O(N^2)$ complexity and will be slower that the previous one for **sufficient large value of N**.
***design** an algorithm that solves the problem.
- study the input of the algorithm (is it special in one sens?)
- design an algorithms comes (given the specificity of your data)
- choose the adequate data structure, i.e. the data structure that will optimize the relevant operations, a.k.a the operations that takes time or that are repeated a large number of time.
***evaluate** the complexity of the algorithm (theoretical point of view)
***take care** of the special cases (can my list be empty, in such a case what is my strategy ?)
***write your specs**: for example if the list is empty, we raise an exception.
***write some tests** to check your implementation is correct
***code**
***profile**: find where are the bottlenecks
***code**
***profile**
* ...
%% Cell type:markdown id: tags:
Note 2:
----------
## Note 1
If your data has some specificity, take advantage of it.
If your data is large enough, a basic implementation of an algorithm with low complexity will run faster than a fined tuned implementation of a algorithm with high complexity
### Example
Example:
--------------
Looking for the missing element problem: we know that all the element for 0 to N should be present. We can compute the sum and calculate the difference between the computed sum and the mathematical sum. This algorithm access only once each element. It thus has an $O(N)$ complexity, where N is the number of elements.
- if your list is sorted, solving the above problem can be done by checking that two consecutive elements in the list are consective numbers. The complexity is thus $O(N)$
- sorting N elements can be done in $O(N)$ in the special case where the $N$ items belongs to a range of size N.
An algorithm that checks if element e belongs to the list, for each e will has an $O(N^2)$ complexity and will be slower that the previous one for **sufficient large value of N**.
%% Cell type:markdown id: tags:
Note 3:
----------
## Note 2
Complexity analysis is done over the **worst case**, what is the worst input for our algorithm
If your data has some specificity, take advantage of it.
Example:
--------------
- sorting elements
- worst case: elements are already sorted but in reverse order
### Example
- counting
- all counts are 1 ....
- if your list is sorted, solving the above problem can be done by checking that two consecutive elements in the list are consective numbers. The complexity is thus $O(N)$
- sorting N elements can be done in $O(N)$ in the special case where the $N$ items belongs to a range of size N.
%% Cell type:markdown id: tags:
### Measure ⏱, don't guess! Profile to find the bottlenecks.
## Note 3
Complexity analysis is done over the **worst case**, what is the worst input for our algorithm
### Do not optimize everything!
### Example
-*"Premature optimization is the root of all evil"* (Donald Knuth)
Sorting elements:
worst case = elements are already sorted but in reverse order
- 80 / 20 rule, efficiency important for expensive things and NOT for small things
%% Cell type:markdown id: tags:
# Different types of profiling
## Time profiling
- Small code snippets
- Script based benchmark
- Function based profiling
- Line based profiling
<pclass="small"><br></p>
## Memory profiling
%% Cell type:markdown id: tags:
## Small code snippets
- There is a module [`timeit` in the standard library](https://docs.python.org/3/library/timeit.html).
`python3 -m timeit -s "import math; l=[]" "for x in range(100): l.append(math.pow(x,2))"`
Problem: the module `timeit` does not try to guess how many times to execute the statement.
- In IPython, you can use the magic command `%timeit` that execute a piece of code and stats the time it spends:
%% Cell type:code id: tags:
``` python
importmath
l=[]
%timeitforxinrange(100):l.append(math.pow(x,2))
%timeit[math.pow(x,2)forxinrange(100)]
l=[]
%timeitforxinrange(100):l.append(x*x)
%timeit[x*xforxinrange(100)]
```
%%%% Output: stream
18.7 µs ± 700 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
13.5 µs ± 566 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
7.27 µs ± 38.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
3.74 µs ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%% Cell type:markdown id: tags:
-[`pyperf`](https://pypi.org/project/pyperf/) is a more powerful tool but we can also do the same as with the module `timeit`:
`python3 -m pyperf timeit -s "import math; l=[]" "for x in range(100): l.append(math.pow(x,2))"`
%% Cell type:markdown id: tags:
## Do not guess (the return of word counting problem)
%% Cell type:code id: tags:
``` python
defbuild_count_base(t):
d={}
forsint:
ifsind:
d[s]+=1
else:
d[s]=1
returnd
defbuild_count_set(t):
d={k:0forkinset(t)}
forsint:
d[s]+=1
returnd
defbuild_count_count(t):
d={k:t.count(k)forkinset(t)}
returnd
defbuild_count_excpt(t):
d={}
forsint:
try:
d[s]+=1
except:
d[s]=1
returnd
importcollections
defbuild_count_counter(t):
returncollections.Counter(t)
defbuild_count_defaultdict(t):
d=collections.defaultdict(int)
forkins:
d[k]+=1
returnd
s="Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam tristique at velit in varius. Cras ut ultricies orci. Fusce vel consequat ante, vitae luctus tortor. Sed condimentum faucibus enim, sit amet pulvinar ligula feugiat ac. Sed interdum id risus id rhoncus. Nullam nisi justo, ultrices eu est nec, hendrerit maximus lorem. Nam urna eros, accumsan nec magna eu, elementum semper diam. Nulla tempus, nibh id elementum dapibus, ex diam lacinia est, sit amet suscipit nulla nibh eu sapien. Aliquam orci enim, malesuada in facilisis vitae, pharetra sit amet mi. Pellentesque mi tortor, sagittis quis odio quis, fermentum faucibus ex. Aenean sagittis nisl orci. Maecenas tristique velit sed leo facilisis porttitor. "
"We consider here wrapping two static languages: C and Fortran.\n",
"We mainly consider here wrapping codes in C and Fortran.\n",
"\n",
"We classically wrap already existing code to access them via Python. \n",
"We classically wrap already existing codes to access them via Python. \n",
"\n",
"Depending on the language to wrap the tool to use are different. \n"
"Depending on the language to wrap, the tools are different. \n"
]
},
{
...
...
%% Cell type:markdown id: tags:
# Wrapping codes in static languages
%% Cell type:markdown id: tags:
We consider here wrapping two static languages: C and Fortran.
We mainly consider here wrapping codes in C and Fortran.
We classically wrap already existing code to access them via Python.
We classically wrap already existing codes to access them via Python.
Depending on the language to wrap the tool to use are different.
Depending on the language to wrap, the tools are different.
%% Cell type:markdown id: tags:
## Fortran with [f2py](https://docs.scipy.org/doc/numpy/f2py/)
`f2py` is a tool that allows to call Fortran code into Python. It is a part of `numpy` meaning that to use it, we only need to install and import numpy (which should already be done if you do scientific Python !) :
```bash
pip3 install numpy
```
%% Cell type:markdown id: tags:
### How does it work ?
The documentation gives several ways to wrap Fortran codes but it all boils down to the same thing:
**f2py allows to wrap the Fortran code in a Python module that can be then imported**
Given this simple Fortran (F90) snippet, that computes the sum of squares of the element of an array:
%% Cell type:markdown id: tags:
*pyfiles/f2py/file_to_wrap.f90*
```fortran
subroutinesum_squares(A,res)
implicitnone
real,dimension(:)::A
real::res
integer::i,N
N=size(A)
res=0.
doi=1,N
res=res+A(i)*A(i)
enddo
endsubroutine
```
%% Cell type:markdown id: tags:
The Fortran code can then be wrapped in one command:
This command calls the module `f2py` of `numpy` to compile (`-c`) *file_to_wrap.f90* into a Python module (`-m`) named *wrap_f90*. The module can then be imported in Python:
%% Cell type:code id: tags:
``` python
importnumpyasnp
importwrap_f90
A=np.ones(10)
result=0.
wrap_f90.sum_squares(A,result)
print(result)
```
%% Cell type:markdown id: tags:
### With intents
%% Cell type:markdown id: tags:
In Fortran, it is considered best practice to put intents to subroutine arguments. This also helps `f2py` to wrap efficiently the code but also changes the subroutine a bit.
This time, f2py recognized that `result` was a outgoing arg. As a consequence, the subroutine was wrapped smartly and made to return the arg.
Note that using a `function` (in the Fortran sense of the term) leads to the same result (see the other example in *pyfiles/f2py/file_to_wrap2.f90*).
%% Cell type:markdown id: tags:
### With modules
%% Cell type:markdown id: tags:
In Fortran, it is also considered best practice to organize the subroutines in modules. These are highly similar to Python modules and are in fact, intepreted as such by f2py !
Consider the following code that implements the dtw and cort computations in Fortran:
*pyfiles/dtw_cort_dist/V9_fortran/dtw_cort.f90*
```fortran
moduledtw_cort
implicitnone
contains
subroutinedtwdistance(s1,s2,dtw_result)
! Computes the dtw between s1 and s2 with distance the absolute distance
doubleprecision,intent(in)::s1(:),s2(:)
doubleprecision,intent(out)::dtw_result
integer::i,j
integer::len_s1,len_s2
doubleprecision::dist
doubleprecision,allocatable::dtw_mat(:,:)
len_s1=size(s1)
len_s2=size(s1)
allocate(dtw_mat(len_s1,len_s2))
dtw_mat(1,1)=dabs(s1(1)-s2(1))
doj=2,len_s2
dist=dabs(s1(1)-s2(j))
dtw_mat(1,j)=dist+dtw_mat(1,j-1)
enddo
doi=2,len_s1
dist=dabs(s1(i)-s2(1))
dtw_mat(i,1)=dist+dtw_mat(i-1,1)
enddo
! Fill the dtw_matrix
doi=2,len_s1
doj=2,len_s2
dist=dabs(s1(i)-s2(j))
dtw_mat(i,j)=dist+dmin1(dtw_mat(i-1,j),&
dtw_mat(i,j-1),&
dtw_mat(i-1,j-1))
enddo
enddo
dtw_result=dtw_mat(len_s1,len_s2)
endsubroutinedtwdistance
doubleprecisionfunctioncort(s1,s2)
! Computes the cort between s1 and s2 (assuming they have the same length)
doubleprecision,intent(in)::s1(:),s2(:)
integer::len_s1,t
doubleprecision::slope_1,slope_2
doubleprecision::num,sum_square_x,sum_square_y
len_s1=size(s1)
num=0
sum_square_x=0
sum_square_y=0
dot=1,len_s1-1
slope_1=s1(t+1)-s1(t)
slope_2=s2(t+1)-s2(t)
num=num+slope_1*slope_2
sum_square_x=sum_square_x+slope_1*slope_1
sum_square_y=sum_square_y+slope_2*slope_2
enddo
cort=num/(dsqrt(sum_square_x*sum_square_y))
endfunctioncort
endmoduledtw_cort
```
%% Cell type:markdown id: tags:
The subroutines `dtwdistance` and `cort` are part of the `dtw_cort` module. The file can be wrapped as before
But the import slighlty changes as `dtw_cort` is now a module of `distances_fort`:
%% Cell type:code id: tags:
``` python
importnumpyasnp
fromdistances_fortimportdtw_cort
cort_result=dtw_cort.cort(s1,s2)
print(cort_result)
```
%% Cell type:markdown id: tags:
Note that the wrapping integrates the documentation of the function (if written...) !
%% Cell type:code id: tags:
``` python
fromdistances_fortimportdtw_cort
print(dtw_cort.cort.__doc__)
```
%% Cell type:markdown id: tags:
### To go further...
Running the command `python3 -m numpy.f2py` (without arguments) gives a lot of information on the supported arguments for further uses of `f2py`. Know that you can this way:
* Specify the compiler to use
* Give the compiler flags (warnings, optimisations...)
* Specify the functions to wrap
* ...
The documentation of f2py (https://docs.scipy.org/doc/numpy/f2py/) can also help, covering notably:
*`Cf2py` directives to overcome F77 limitations (e.g. intents)
* How to integrate Fortran sources to your Python packages and compile them on install
* How to use `f2py` inside Python scripts
* ...
%% Cell type:markdown id: tags:
Wrapping C code
--------------------------
%% Cell type:markdown id: tags:
They are different ways of wrapping C code. We present CFFI.
The workflow is the following:
1. Get your C code working (with a .c and a .h)
2. Set up your packaging to compile your code as a module
3. Compile your code
4. In the python code, declare the function you will be using
5. In the python code, open/load the compiled module
6. Use your functions
%% Cell type:markdown id: tags:
**1. Get your C code working (with a .c and a .h)**
Ok, supposed to be done
%% Cell type:markdown id: tags:
**2. Set up your packaging to compile your code as a module**
We give the compilation instructions in the file setup.py:
%% Cell type:markdown id: tags:
```python
fromsetuptoolsimportsetup,Extension
version="0.1"
module_distance=Extension(
name="cdtw",
sources=["cdtw.c"],
)
setup(
name="dtw_cort_dist_mat",
version=version,
description="data scientist tool for time series",
long_description="data scientist tool for time series",