Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
python-uga
training-hpc
Commits
b2163dfb
Commit
b2163dfb
authored
Jun 11, 2019
by
Loic Huder
Browse files
Merge remote-tracking branch 'origin/master'
parents
63175989
457d68e0
Changes
9
Hide whitespace changes
Inline
Side-by-side
ipynb/21_pres_algos_dtw_cort.ipynb
View file @
b2163dfb
...
...
@@ -14,17 +14,17 @@
"source": [
"Algorithm on series (*e.g.* time series, character strings, ...)\n",
"\n",
"A series
can be either
a one dimensional ordered series of items (e.g. an numerical array, a string).\n",
"A series
is
a one dimensional ordered series of items (e.g. an numerical array, a string).\n",
"\n",
"We want to compute a dissimilarity measure between series. The measure can either apply to series of \n",
"same length or not, and can be a metric (i.e. symmetric, $d(x, y) = 0 \\iff x = y$, triangular inequality). \n",
"\n",
"We consider two dissimilarity measures with different features. \n",
"We consider two
(
dis
)
similarity measures with different features. \n",
"\n",
"S1 and S2 two series of length |S1| and |S2|\n",
"\n",
"- **dtw**: dynamic time wrapping, complexity O(|S1|*|S2|)\n",
"- **cort**: normalized cosine
distances
between derivatives, complexity O(|S1| + |S2|)\n",
"- **dtw**:
similarity measure:
dynamic time wrapping, complexity O(|S1|*|S2|)\n",
"- **cort**: normalized cosine
similarity measure
between derivatives, complexity O(|S1| + |S2|)\n",
"\n"
]
},
...
...
@@ -206,7 +206,7 @@
"\n",
"**Input**: two series S1 and S2 *of same length*\n",
"\n",
"**Output:** a
dis
similarity measure\n",
"**Output:** a similarity measure\n",
"\n",
"**Complexity:** O(|S1|+|S2|)\n",
"\n",
...
...
@@ -215,21 +215,9 @@
"What do we compute: \n",
"-------------------------------\n",
"\n",
"The cosine distance between derivatives of the series. \n",
"\n",
"normalize \n",
"\n",
" num = 0.0\n",
" sum_square_x = 0.0\n",
" sum_square_y = 0.0\n",
" for t in range(len(s1) - 1):\n",
" slope_1 = s1[t + 1] - s1[t]\n",
" slope_2 = s2[t + 1] - s2[t]\n",
" num += slope_1 * slope_2\n",
" sum_square_x += slope_1 * slope_1\n",
" sum_square_y += slope_2 * slope_2\n",
" return num / (np.sqrt(sum_square_x * sum_square_y))\n",
"The cosine similarity measure between derivatives of the series. \n",
"\n",
" \n",
"\n"
]
},
...
...
@@ -243,23 +231,11 @@
"$$ \\begin{eqnarray}\n",
"cort(A, B) &=& \\cos(dA, dB) \\\\\n",
"&=& \\frac{dA \\cdot dB}{\\Vert dA\\Vert \\Vert dB\\Vert} \\\\\n",
"&=& \\frac{\\sum_{i=0}^
T
dA_i dB_i}{\\Vert dA\\Vert \\Vert dB\\Vert} \\\\\n",
"&=& \\frac{\\sum_{i=0}^
T
(A_{i+1}-A_i) (B_{i+1}-B_i)}{\\sqrt{\\sum_{i=0}^
T
(A_{i+1}-A_i)^2} \\sqrt{\\sum_{i=0}^
T
(B_{i+1}-B_i)^2}}\n",
"&=& \\frac{\\sum_{i=0}^
{T}
dA_i dB_i}{\\Vert dA\\Vert \\Vert dB\\Vert} \\\\\n",
"&=& \\frac{\\sum_{i=0}^
{T-1}
(A_{i+1}-A_i) (B_{i+1}-B_i)}{\\sqrt{\\sum_{i=0}^
{T-1}
(A_{i+1}-A_i)^2} \\sqrt{\\sum_{i=0}^
{T-1}
(B_{i+1}-B_i)^2}}\n",
"\\end{eqnarray} $$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What do we compute:\n",
"------------------------------\n",
"\n",
"<div align=\"middle\">\n",
"<img src=\"./fig/cort_formule.png\" style=\"width: 75%\">\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
...
...
@@ -270,7 +246,7 @@
},
{
"cell_type": "code",
"execution_count":
28
,
"execution_count":
3
,
"metadata": {},
"outputs": [],
"source": [
...
...
@@ -300,25 +276,22 @@
},
{
"cell_type": "code",
"execution_count":
29
,
"execution_count":
15
,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.4629100498862757"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
"name": "stdout",
"output_type": "stream",
"text": [
"cort(x,2*x)=1.0 cort([1,2], [2,1])=-1.0\n"
]
}
],
"source": [
"x = [1, 2, 3, 5, 5, 6]\n",
"y = [1, 1, 2, 2, 3, 5]\n",
"\n",
"
cort(x,y
)"
"
print(f\"cort(x,2*x)={cort(x, 2*x)} cort([1,2], [2,1])={cort([1,2], [2,1])}\"
)"
]
}
],
...
...
@@ -338,7 +311,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.
6.7
"
"version": "3.
7.3
"
}
},
"nbformat": 4,
...
...
%% Cell type:markdown id: tags:
A running example
================
%% Cell type:markdown id: tags:
Algorithm on series (
*e.g.*
time series, character strings, ...)
A series
can be either
a one dimensional ordered series of items (e.g. an numerical array, a string).
A series
is
a one dimensional ordered series of items (e.g. an numerical array, a string).
We want to compute a dissimilarity measure between series. The measure can either apply to series of
same length or not, and can be a metric (i.e. symmetric, $d(x, y) = 0
\i
ff x = y$, triangular inequality).
We consider two dissimilarity measures with different features.
We consider two
(
dis
)
similarity measures with different features.
S1 and S2 two series of length |S1| and |S2|
-
**dtw**
: dynamic time wrapping, complexity O(|S1|
*
|S2|)
-
**cort**
: normalized cosine
distances
between derivatives, complexity O(|S1| + |S2|)
-
**dtw**
:
similarity measure:
dynamic time wrapping, complexity O(|S1|
*
|S2|)
-
**cort**
: normalized cosine
similarity measure
between derivatives, complexity O(|S1| + |S2|)
%% Cell type:markdown id: tags:
Dynamic Time Wrapping
------------------------------------
-
**Input:**
two series, S1 and S2, not necessarily of same length
-
**Output:**
a dissimilarity measure
-
**Complexity:**
O([S1|
*
|S2|)
-
**Metric:**
no: does not respect the triangular inequality
-
**Side product:**
an alignment between the series can be stored
What do we compute:
-------------------------------
The transformation (with minimal cost) to transform one serie in the other one.
Example of what is computed
-------------------------------------------
%% Cell type:markdown id: tags:
Example of result of dtw :
-----------------------------------
<div
align=
"middle"
>
<img
src=
"./fig/dtw_example.png"
style=
"width: 100%"
>
</div>
%% Cell type:markdown id: tags:
Idea of the algorithm
------------------------------
Inspired from
https://riptutorial.com/dynamic-programming/example/25780/introduction-to-dynamic-time-warping
Let $a$ and $b$ be two series. We have:
-
dtw is a dynamic programming algorithm: the solution is built incrementally
-
a table $t$ is incrementally filled.
-
the value of the cell $t[i, j]$ holds the
*distance*
between the sub series $a[:i]$ and $b[:j]$
-
the value of the cell $t[i, j]$ is computed using the values of cells $t[i-1, j]$, $t[i, j-1]$ and $t[i-1, j-1]$:
$$t[i, j] = d(i, j) + min(t[i-1, j], t[i-1, j-1], t[i, j-1])$$
where $d(i, j)$ is the distance between $s[i]$ and $s[j]$ (we will use the absolute difference)
An example with two series [0, 1, 1, 2, 2, 3, 5] and [0, 1, 2, 3, 5, 5, 5, 6]
%% Cell type:markdown id: tags:
(image from https://riptutorial.com/dynamic-programming/example/25780/introduction-to-dynamic-time-warping
<div
align=
"middle"
>
<img
src=
"./fig/dtw_ex_table.jpg"
style=
"width: 100%"
>
</div>
%% Cell type:markdown id: tags:
why 6 in the t[-1, -1] ?
%% Cell type:markdown id: tags:
Easy enough to implement:
----------------------------------------
%% Cell type:code id: tags:
```
python
import
numpy
as
np
def
DTWDistance_pure_python
(
s1
,
s2
):
""" Computes the dtw between s1 and s2 with distance the absolute distance
:param s1: the first series (ie an iterable over floats64)
:param s2: the second series (ie an iterable over floats64)
:returns: the dtw distance
:rtype: float64
"""
_dtw_mat
=
np
.
empty
([
len
(
s1
),
len
(
s2
)])
_dtw_mat
[
0
,
0
]
=
abs
(
s1
[
0
]
-
s2
[
0
])
# two special cases : filling first row and columns
for
j
in
range
(
1
,
len
(
s2
)):
dist
=
abs
(
s1
[
0
]
-
s2
[
j
])
_dtw_mat
[
0
,
j
]
=
dist
+
_dtw_mat
[
0
,
j
-
1
]
for
i
in
range
(
1
,
len
(
s1
)):
dist
=
abs
(
s1
[
i
]
-
s2
[
0
])
_dtw_mat
[
i
,
0
]
=
dist
+
_dtw_mat
[(
i
-
1
,
0
)]
# filling the matrix
for
i
in
range
(
1
,
len
(
s1
)):
for
j
in
range
(
1
,
len
(
s2
)):
dist
=
abs
(
s1
[
i
]
-
s2
[
j
])
_dtw_mat
[(
i
,
j
)]
=
dist
+
min
(
_dtw_mat
[
i
-
1
,
j
],
_dtw_mat
[
i
,
j
-
1
],
_dtw_mat
[
i
-
1
,
j
-
1
])
return
_dtw_mat
[
len
(
s1
)
-
1
,
len
(
s2
)
-
1
],
_dtw_mat
```
%% Cell type:code id: tags:
```
python
x
=
[
1
,
2
,
3
,
5
,
5
,
5
,
6
]
y
=
[
1
,
1
,
2
,
2
,
3
,
5
]
nx
=
len
(
x
)
ny
=
len
(
y
)
d
,
mat
=
DTWDistance_pure_python
(
x
,
y
)
print
(
d
)
mat
```
%%%% Output: stream
1.0
%%%% Output: execute_result
array([[ 0., 0., 1., 2., 4., 8.],
[ 1., 1., 0., 0., 1., 4.],
[ 3., 3., 1., 1., 0., 2.],
[ 7., 7., 4., 4., 2., 0.],
[11., 11., 7., 7., 4., 0.],
[15., 15., 10., 10., 6., 0.],
[20., 20., 14., 14., 9., 1.]])
%% Cell type:markdown id: tags:
Cort
-------
**Input**
: two series S1 and S2
*of same length*
**Output:**
a
dis
similarity measure
**Output:**
a similarity measure
**Complexity:**
O(|S1|+|S2|)
**Metric:**
yes
What do we compute:
-------------------------------
The cosine
distanc
e between derivatives of the series.
The cosine
similarity measur
e between derivatives of the series.
normalize
num = 0.0
sum_square_x = 0.0
sum_square_y = 0.0
for t in range(len(s1) - 1):
slope_1 = s1[t + 1] - s1[t]
slope_2 = s2[t + 1] - s2[t]
num += slope_1 * slope_2
sum_square_x += slope_1 * slope_1
sum_square_y += slope_2 * slope_2
return num / (np.sqrt(sum_square_x * sum_square_y))
%% Cell type:markdown id: tags:
What do we compute:
------------------------------
$$
\b
egin{eqnarray}
cort(A, B) &=&
\c
os(dA, dB)
\\
&=&
\f
rac{dA
\c
dot dB}{
\V
ert dA
\V
ert
\V
ert dB
\V
ert}
\\
&=&
\f
rac{
\s
um_{i=0}^
T
dA_i dB_i}{
\V
ert dA
\V
ert
\V
ert dB
\V
ert}
\\
&=&
\f
rac{
\s
um_{i=0}^
T
(A_{i+1}-A_i) (B_{i+1}-B_i)}{
\s
qrt{
\s
um_{i=0}^
T
(A_{i+1}-A_i)^2}
\s
qrt{
\s
um_{i=0}^
T
(B_{i+1}-B_i)^2}}
&=&
\f
rac{
\s
um_{i=0}^
{T}
dA_i dB_i}{
\V
ert dA
\V
ert
\V
ert dB
\V
ert}
\\
&=&
\f
rac{
\s
um_{i=0}^
{T-1}
(A_{i+1}-A_i) (B_{i+1}-B_i)}{
\s
qrt{
\s
um_{i=0}^
{T-1}
(A_{i+1}-A_i)^2}
\s
qrt{
\s
um_{i=0}^
{T-1}
(B_{i+1}-B_i)^2}}
\e
nd{eqnarray} $$
%% Cell type:markdown id: tags:
What do we compute:
------------------------------
<div
align=
"middle"
>
<img
src=
"./fig/cort_formule.png"
style=
"width: 75%"
>
</div>
%% Cell type:markdown id: tags:
Easy enough to implement:
---------------------------------------
%% Cell type:code id: tags:
```
python
from
math
import
sqrt
def
cort
(
s1
,
s2
):
""" Computes the cort between series one and two (assuming they have the same length)
:param s1: the first series (or any iterable over floats64)
:param s2: the second series (or any iterable over floats64)
:returns: the cort distance
:rtype: float
:precondition: series are assumed to be of same size
"""
num
=
0.0
sum_square_x
=
0.0
sum_square_y
=
0.0
for
t
in
range
(
len
(
s1
)
-
1
):
slope_1
=
s1
[
t
+
1
]
-
s1
[
t
]
slope_2
=
s2
[
t
+
1
]
-
s2
[
t
]
num
=
num
+
slope_1
*
slope_2
sum_square_x
=
sum_square_x
+
(
slope_1
*
slope_1
)
sum_square_y
=
sum_square_y
+
(
slope_2
*
slope_2
)
return
num
/
(
sqrt
(
sum_square_x
*
sum_square_y
))
```
%% Cell type:code id: tags:
```
python
x
=
[
1
,
2
,
3
,
5
,
5
,
6
]
y
=
[
1
,
1
,
2
,
2
,
3
,
5
]
cort
(
x
,
y
)
print
(
f
"cort(x,2*x)=
{
cort
(
x
,
2
*
x
)
}
cort([1,2], [2,1])=
{
cort
([
1
,
2
],
[
2
,
1
])
}
"
)
```
%%%% Output:
execute_result
%%%% Output:
stream
0.4629100498862757
cort(x,2*x)=1.0 cort([1,2], [2,1])=-1.0
...
...
pyfiles/dtw_cort_dist/V3_c_dtw_cort_vect/Makefile
0 → 100644
View file @
b2163dfb
all
:
python3 setup.py develop
clean
:
rm
-f
cdtw.
*
.so
rm
-rf
build dtw_cort_dist_mat.egg-info
pyfiles/dtw_cort_dist/V3_c_dtw_cort_vect/cdtw.c
0 → 100644
View file @
b2163dfb
#include <stdlib.h>
#include <stdio.h>
#include <float.h>
#include <math.h>
#include <float.h>
/* for max_float */
#include "cdtw.h"
double
min3
(
const
double
x
,
const
double
y
,
const
double
z
){
if
(
x
<
y
)
if
(
x
<
z
)
return
x
;
if
(
y
<
z
)
return
y
;
return
z
;
}
/*
Returns the minimum-distance warp path
between multivariate time series x and y.
*/
double
dtw
(
double
*
x
,
double
*
y
,
int
size_x
,
int
size_y
){
int
i
,
j
;
double
*
temp
;
double
*
costP
=
malloc
(
size_x
*
sizeof
(
double
));
if
(
!
costP
)
return
-
1
;
double
*
costC
=
malloc
(
size_x
*
sizeof
(
double
));
if
(
!
costC
){
free
(
costP
);
return
-
1
;
/* TODO: better handling of the error */
}
costP
[
0
]
=
fabs
(
x
[
0
]
-
y
[
0
]);
for
(
i
=
1
;
i
<
size_x
;
i
++
){
costP
[
i
]
=
fabs
(
x
[
i
]
-
y
[
0
])
+
costP
[
i
-
1
];
}
for
(
j
=
1
;
j
<
size_y
;
j
++
){
costC
[
0
]
=
costP
[
0
]
+
fabs
(
x
[
0
]
-
y
[
j
]);
for
(
i
=
1
;
i
<
size_x
;
i
++
){
costC
[
i
]
=
min3
(
costC
[
i
-
1
],
costP
[
i
-
1
],
costP
[
i
])
+
fabs
(
x
[
i
]
-
y
[
j
]);
}
temp
=
costP
;
costP
=
costC
;
costC
=
temp
;
}
/* result is read in costP because switching between costP and costC is done */
double
ret
=
costP
[
size_x
-
1
];
free
(
costP
);
free
(
costC
);
return
ret
;
}
double
cort
(
double
*
x
,
double
*
y
,
int
size
){
int
t
;
double
slopex
,
slopey
;
double
num
;
double
sumSquareXslope
,
sumSquareYslope
;
num
=
sumSquareXslope
=
sumSquareYslope
=
0
.
0
;
for
(
t
=
0
;
t
<
size
-
1
;
t
++
){
slopex
=
x
[
t
+
1
]
-
x
[
t
];
slopey
=
y
[
t
+
1
]
-
y
[
t
];
num
+=
slopex
*
slopey
;
sumSquareXslope
+=
slopex
*
slopex
;
sumSquareYslope
+=
slopey
*
slopey
;
}
return
num
/
sqrt
(
sumSquareXslope
*
sumSquareYslope
);
}
pyfiles/dtw_cort_dist/V3_c_dtw_cort_vect/cdtw.h
0 → 100644
View file @
b2163dfb
double
dtw
(
double
*
x
,
double
*
y
,
int
size_x
,
int
size_y
);
double
cort
(
double
*
x
,
double
*
y
,
int
size
);
pyfiles/dtw_cort_dist/
broken_
V3_c_dtw_cort_vect/dtw_cort_dist_mat.py
→
pyfiles/dtw_cort_dist/V3_c_dtw_cort_vect/dtw_cort_dist_mat.py
View file @
b2163dfb
File moved
pyfiles/dtw_cort_dist/
broken_
V3_c_dtw_cort_vect/profile_data.pyprof
→
pyfiles/dtw_cort_dist/V3_c_dtw_cort_vect/profile_data.pyprof
View file @
b2163dfb
File moved
pyfiles/dtw_cort_dist/V3_c_dtw_cort_vect/setup.py
0 → 100644
View file @
b2163dfb
from
setuptools
import
setup
,
Extension
version
=
"0.1"
module_distance
=
Extension
(
name
=
"cdtw"
,
sources
=
[
"cdtw.c"
],
extra_compile_args
=
[
"-O3"
,
"-mavx2"
],
extra_link_args
=
[
"-O3"
],
)
setup
(
name
=
"dtw_cort_dist_mat"
,
version
=
version
,
description
=
"data scientist tool for time series"
,
long_description
=
"data scientist tool for time series"
,
classifiers
=
[],
# Get strings from http://pypi.python.org/pypi?%3Aaction=list_classifiers
author
=
"IKATS project"
,
author_email
=
"Franck.Thollard@imag.fr"
,
url
=
"http://ikats.imag.fr"
,
license
=
"GPL"
,
include_package_data
=
True
,
zip_safe
=
False
,
install_requires
=
[
"cffi"
,
"numpy"
,
"setuptools"
],
entry_points
=
""
,
ext_modules
=
[
module_distance
],
)
pyfiles/dtw_cort_dist/tasks.py
View file @
b2163dfb
from
pathlib
import
Path
from
invoke
import
task
import
numpy
as
np
here
=
Path
(
__file__
).
parent
.
absolute
()
...
...
@@ -11,7 +12,10 @@ def clean(c):
for
path
in
directories
:
if
(
path
/
"Makefile"
).
exists
():
c
.
run
(
f
"cd
{
path
}
&& make clean"
,
echo
=
True
)
for
algo
in
(
"dtw"
,
"cort"
):
f
=
path
/
"ref_{algo}.npy"
if
f
.
exists
():
f
.
unlink
()
@
task
def
build
(
c
):
...
...
@@ -19,6 +23,18 @@ def build(c):
if
(
path
/
"Makefile"
).
exists
():
c
.
run
(
f
"cd
{
path
}
&& make"
,
echo
=
True
)
@
task
def
check
(
c
):
ref_dtw
=
np
.
load
(
"ref_small_dtw.npy"
)
ref_cort
=
np
.
load
(
"ref_small_cort.npy"
)
for
path
in
directories
:
c
.
run
(
f
"cd
{
path
}
&& ./dtw_cort_dist_mat.py -s res"
,
echo
=
True
)
calc_dtw
=
np
.
load
(
f
"
{
path
/
'res_dtw.npy'
}
"
)
print
(
f
"
{
path
.
name
}
dtw : allclose
{
np
.
allclose
(
ref_dtw
,
calc_dtw
)
}
"
)
calc_cort
=
np
.
load
(
f
"
{
path
/
'res_cort.npy'
}
"
)
print
(
f
"
{
path
.
name
}
cort: allclose
{
np
.
allclose
(
ref_cort
,
calc_cort
)
}
"
)
@
task
def
bench
(
c
,
medium
=
False
):
...
...
pyfiles/dtw_cort_dist/util.py
View file @
b2163dfb
...
...
@@ -33,6 +33,9 @@ def main(compute, only_init=False):
parser
.
add_argument
(
"-n"
,
type
=
int
,
default
=
None
,
help
=
"number of series (default all 200)"
)
parser
.
add_argument
(
"-s"
,
type
=
str
,
default
=
None
,
help
=
"save the matrices (provide prefixes)"
)
parser
.
add_argument
(
"-v"
,
action
=
"store_true"
,
default
=
None
,
help
=
"visualize the matrix"
)
...
...
@@ -55,12 +58,18 @@ def main(compute, only_init=False):
_dist_mat_dtw
,
_dist_mat_cort
=
compute
(
series
,
nb_series
)
print
(
"elapsed time = {:.3f} s"
.
format
(
time
()
-
t0
))
if
args
.
s
:
dtw_mat
=
"{}_dtw"
.
format
(
args
.
s
)
np
.
save
(
dtw_mat
,
_dist_mat_dtw
)
cort_mat
=
"{}_cort"
.
format
(
args
.
s
)
np
.
save
(
cort_mat
,
_dist_mat_cort
)
if
args
.
v
:
import
matplotlib.pyplot
as
plt
fig
,
axes
=
plt
.
subplots
(
2
)
cax0
=
axes
[
0
].
imshow
(
_dist_mat_dtw
,
cmap
=
plt
.
cm
.
gray
)
axes
[
0
].
imshow
(
_dist_mat_dtw
,
cmap
=
plt
.
cm
.
gray
)
axes
[
0
].
set_title
(
"dtw"
)
cax1
=
axes
[
1
].
imshow
(
_dist_mat_cort
,
cmap
=
plt
.
cm
.
gray
)
axes
[
1
].
imshow
(
_dist_mat_cort
,
cmap
=
plt
.
cm
.
gray
)
axes
[
1
].
set_title
(
"cort"
)
plt
.
show
()
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment