两个不等长序列距离计算之DTW

作者: 丙吉 | 来源:发表于2021-10-20 11:40 被阅读0次

两个不等长序列距离计算之DTW
DTW算法挖掘亿万级时序数据，其优化能耐几何？
DTW(Dynamic Time Warping) 动态时间规整
动态时间规整（DTW）案例2
是什么划分出了这么多的相似性度量的方法
时间序列聚类和分析
2012SIGKDD-(UCR Suite)Searching
算法短记 — DTW（动态时间规整）
动态规划求最小编辑距离
那些Python方法---zip()

目前我理解的DTW，是计算两个不等长序列之前的相似度，至于如何从距离转化为相似度，还没想好，不过先把计算两者之间距离的代码记录下：

def dtw_distance(ts_a, ts_b, d=lambda x,y: manhattan_distances([[x]],[[y]]), mww=10000):
    """
    computers dtw distance between two time series
    
    Args:
        ts_a: time series a
        ts_b: time series b
        d: distance function #在这里用到的是曼哈顿距离(求绝对值距离)
        mww: max warping window, int, optional(default = infinity)
        
    Returns:
        dtw distance
    """
    # Create cost matrix via broadcasting with large int 
    ts_a, ts_b = np.array(ts_a), np.array(ts_b)
    M, N = len(ts_a), len(ts_b)
    cost = np.ones((M, N))
    
    # Initialize the first row and column
    cost[0,0] = d(ts_a[0], ts_b[0])
    for i in range(1, M):
        cost[i,0] = cost[i-1, 0] + d(ts_a[i], ts_b[0])
    for j in range(1, N):
        cost[0,j] = cost[0, j-1] + d(ts_a[0], ts_b[j])
    
    # Populate rest of cost matrix within window
    for i in range(1,M):
        for j in range(max(1, i - mww), min(N, i+ mww)):
            choices = cost[i-1, j-1], cost[i, j-1], cost[i-1, j]
            cost[i, j] = min(choices) + d(ts_a[i], ts_b[j])
    # Return DTW distance geiven window
    return cost, cost[-1,-1]

计算[0, 0, 1, 3, 4, 5, 5, 5, 6, 6], [0, 1, 2, 3, 4, 5, 15, 5, 6, 6]两个序列的距离，为11

dtw_distance([0, 0, 1, 3, 4, 5, 5, 5, 6, 6],[0, 1, 2, 3, 4, 5, 15, 5, 6, 6])
(array([[ 0.,  1.,  3.,  6., 10., 15., 30., 35., 41., 47.],
        [ 0.,  1.,  3.,  6., 10., 15., 30., 35., 41., 47.],
        [ 1.,  0.,  1.,  3.,  6., 10., 24., 28., 33., 38.],
        [ 4.,  2.,  1.,  1.,  2.,  4., 16., 18., 21., 24.],
        [ 8.,  5.,  3.,  2.,  1.,  2., 13., 14., 16., 18.],
        [13.,  9.,  6.,  4.,  2.,  1., 11., 11., 12., 13.],
        [18., 13.,  9.,  6.,  3.,  1., 11., 11., 12., 13.],
        [23., 17., 12.,  8.,  4.,  1., 11., 11., 12., 13.],
        [29., 22., 16., 11.,  6.,  2., 10., 11., 11., 11.],
        [35., 27., 20., 14.,  8.,  3., 11., 11., 11., 11.]]),
 11.0)