 A running example
================

Algorithm on series (*e.g.* time series, character strings, ...)

Dynamic Time Wrapping 
------------------------------------

- **Input:** two series, S1 and S2, not necessarily of same length
- **Output:** a dissimilarity measure
- **Complexity:** O([S1|*|S2|)
- **Metric:** no: does not respect the triangular inequality
- **Side product:** an alignment between the series can be stored

What do we compute: 
-------------------------------
The transformation (with minimal cost) to transform one serie in the other one. Example of result of dtw :
-----------------------------------

Idea of the algorithm
------------------------------

Inspired from https://riptutorial.com/dynamic-programming/example/25780/introduction-to-dynamic-time-warping

Let a and b be two series.

(image from https://riptutorial.com/dynamic-programming/example/25780/introduction-to-dynamic-time-warping

Easy enough to implement:
----------------------------------------

import numpy as np

def DTWDistance_pure_python(s1, s2):
    """
    Computes the dtw between s1 and s2 with distance the absolute distance
    :param s1: the first series (ie an iterable over floats64)
    :param s2: the second series (ie an iterable over floats64)
    :returns: the dtw distance
    :rtype: float64
    """
    _dtw_mat = np.empty([len(s1), len(s2)])
    _dtw_mat[0, 0] = abs(s1[0] - s2[0])
    # two special cases : filling first row and columns
    for j in range(1, len(s2)):
        dist = abs(s1[0]-s2[j])
        _dtw_mat[0, j] = dist + _dtw_mat[0, j-1]
    for i in range(1, len(s1)):
        dist = abs(s1[i]-s2[0])
        _dtw_mat[i, 0] = dist + _dtw_mat[(i-1, 0)]
    # filling the matrix
    for i in range(1, len(s1)):
        for j in range(1, len(s2)):
            dist = abs(s1[i]-s2[j])
            _dtw_mat[(i, j)] = dist + min(_dtw_mat[i-1, j],
                                          _dtw_mat[i, j-1],
                                          _dtw_mat[i-1, j-1])
    return _dtw_mat[len(s1)-1, len(s2)-1], _dtw_mat

x = [1, 2, 3, 5, 5, 5, 6]
y = [1, 1, 2, 2, 3, 5]
nx = len(x)
ny = len(y)
d, mat = DTWDistance_pure_python(x, y)
print(d)
mat

Output:
1.0

array([[ 0.,  0.,  1.,  2.,  4.,  8.],
       [ 1.,  1.,  0.,  0.,  1.,  4.],
       [ 3.,  3.,  1.,  1.,  0.,  2.],
       [ 7.,  7.,  4.,  4.,  2.,  0.],
       [11., 11.,  7.,  7.,  4.,  0.],
       [15., 15., 10., 10.,  6.,  0.],
       [20., 20., 14., 14.,  9.,  1.]])

Cort 
-------

**Input**: two series S1 and S2 *of same length*

**Output:** a similarity measure

**Complexity:** O(|S1|+|S2|)

**Metric:** yes

What do we compute:
How do we compute:
------------------------------

\begin{eqnarray}
cort(A, B) &=& \cos(dA, dB) \\
&=& \frac{dA \cdot dB}{\Vert dA\Vert \Vert dB\Vert} \\
&=& \frac{\sum_{i=0}^{T} dA_i dB_i}{\Vert dA\Vert \Vert dB\Vert} \\
&=& \frac{\sum_{i=0}^{T-1} (A_{i+1}-A_i) (B_{i+1}-B_i)}{\sqrt{\sum_{i=0}^{T-1} (A_{i+1}-A_i)^2} \sqrt{\sum_{i=0}^{T-1} (B_{i+1}-B_i)^2}}
\end{eqnarray}

Easy enough to implement:
--------------------------------------- A series is a one dimensional ordered series of items (e.g. an numerical array, a string). We want to compute a dissimilarity measure between series. The measure can either apply to series of same length or not, and can be a metric (i.e. symmetric, d(x, y) = 0 \iff x = y, triangular inequality). We consider two (dis)similarity measures with different features. S1 and S2 two series of length |S1| and |S2| - **dtw**: similarity measure: dynamic time wrapping, complexity O(|S1|*|S2|) - **cort**: normalized cosine similarity measure between derivatives, complexity O(|S1| + |S2|) %% Cell type:markdown id: tags: Dynamic Time Wrapping ------------------------------------ - **Input:** two series, S1 and S2, not necessarily of same length - **Output:** a dissimilarity measure - **Complexity:** O([S1|*|S2|) - **Metric:** no: does not respect the triangular inequality - **Side product:** an alignment between the series can be stored What do we compute: ------------------------------- The transformation (with minimal cost) to transform one serie in the other one. Example of what is computed ------------------------------------------- %% Cell type:markdown id: tags: Example of result of dtw : ----------------------------------- %% Cell type:markdown id: tags: Idea of the algorithm ------------------------------ Inspired from https://riptutorial.com/dynamic-programming/example/25780/introduction-to-dynamic-time-warping Let a and b be two series. We have: - dtw is a dynamic programming algorithm: the solution is built incrementally - a table t is incrementally filled. - the value of the cell t[i, j] holds the *distance* between the sub series a[:i] and b[:j] - the value of the cell t[i, j] is computed using the values of cells t[i-1, j], t[i, j-1] and t[i-1, j-1]:t[i, j] = d(i, j) + min(t[i-1, j], t[i-1, j-1], t[i, j-1])$$where d(i, j) is the distance between s[i] and s[j] (we will use the absolute difference) An example with two series [0, 1, 1, 2, 2, 3, 5] and [0, 1, 2, 3, 5, 5, 5, 6] %% Cell type:markdown id: tags: (image from https://riptutorial.com/dynamic-programming/example/25780/introduction-to-dynamic-time-warping %% Cell type:markdown id: tags: why 6 in the t[-1, -1] ? %% Cell type:markdown id: tags: Easy enough to implement: ---------------------------------------- %% Cell type:code id: tags:  python import numpy as np def DTWDistance_pure_python(s1, s2): """ Computes the dtw between s1 and s2 with distance the absolute distance :param s1: the first series (ie an iterable over floats64) :param s2: the second series (ie an iterable over floats64) :returns: the dtw distance :rtype: float64 """ _dtw_mat = np.empty([len(s1), len(s2)]) _dtw_mat[0, 0] = abs(s1[0] - s2[0]) # two special cases : filling first row and columns for j in range(1, len(s2)): dist = abs(s1[0]-s2[j]) _dtw_mat[0, j] = dist + _dtw_mat[0, j-1] for i in range(1, len(s1)): dist = abs(s1[i]-s2[0]) _dtw_mat[i, 0] = dist + _dtw_mat[(i-1, 0)] # filling the matrix for i in range(1, len(s1)): for j in range(1, len(s2)): dist = abs(s1[i]-s2[j]) _dtw_mat[(i, j)] = dist + min(_dtw_mat[i-1, j], _dtw_mat[i, j-1], _dtw_mat[i-1, j-1]) return _dtw_mat[len(s1)-1, len(s2)-1], _dtw_mat  %% Cell type:code id: tags:  python x = [1, 2, 3, 5, 5, 5, 6] y = [1, 1, 2, 2, 3, 5] nx = len(x) ny = len(y) d, mat = DTWDistance_pure_python(x, y) print(d) mat  %%%% Output: stream 1.0 %%%% Output: execute_result array([[ 0., 0., 1., 2., 4., 8.], [ 1., 1., 0., 0., 1., 4.], [ 3., 3., 1., 1., 0., 2.], [ 7., 7., 4., 4., 2., 0.], [11., 11., 7., 7., 4., 0.], [15., 15., 10., 10., 6., 0.], [20., 20., 14., 14., 9., 1.]]) %% Cell type:markdown id: tags: Cort ------- **Input**: two series S1 and S2 *of same length* **Output:** a similarity measure **Complexity:** O(|S1|+|S2|) **Metric:** yes What do we compute: ------------------------------- The cosine similarity measure between derivatives of the series. %% Cell type:markdown id: tags: What do we compute: How do we compute: ------------------------------$$ \begin{eqnarray} cort(A, B) &=& \cos(dA, dB) \\ &=& \frac{dA \cdot dB}{\Vert dA\Vert \Vert dB\Vert} \\ &=& \frac{\sum_{i=0}^{T} dA_i dB_i}{\Vert dA\Vert \Vert dB\Vert} \\ &=& \frac{\sum_{i=0}^{T-1} (A_{i+1}-A_i) (B_{i+1}-B_i)}{\sqrt{\sum_{i=0}^{T-1} (A_{i+1}-A_i)^2} \sqrt{\sum_{i=0}^{T-1} (B_{i+1}-B_i)^2}} \end{eqnarray}  %% Cell type:markdown id: tags: Easy enough to implement: --------------------------------------- %% Cell type:code id: tags:  python from math import sqrt def cort(s1, s2): """ Computes the cort between series one and two (assuming they have the same length) :param s1: the first series (or any iterable over floats64) :param s2: the second series (or any iterable over floats64) :returns: the cort distance :rtype: float :precondition: series are assumed to be of same size """ num = 0.0 sum_square_x = 0.0 sum_square_y = 0.0 for t in range(len(s1)-1): slope_1 = s1[t+1] - s1[t] slope_2 = s2[t+1] - s2[t] num = num + slope_1 * slope_2 sum_square_x = sum_square_x + (slope_1*slope_1) sum_square_y = sum_square_y + (slope_2 * slope_2) return num/(sqrt(sum_square_x*sum_square_y))  %% Cell type:code id: tags:  python x = [1, 2, 3, 5, 5, 6] y = [1, 1, 2, 2, 3, 5] print(f"cort(x,2*x)={cort(x, 2*x)} cort([1,2], [2,1])={cort([1,2], [2,1])}")  %%%% Output: stream cort(x,2*x)=1.0 cort([1,2], [2,1])=-1.0 ... ...
