美文网首页大数据,机器学习,人工智能我是程序员阿里云
SLS机器学习最佳实战:批量时序异常检测

SLS机器学习最佳实战:批量时序异常检测

作者: 阿里云云栖号 | 来源:发表于2019-07-01 17:50 被阅读39次

    1. 高频检测场景

    1.1 场景一

    集群中有N台机器,每台机器中有M个时序指标(CPU、内存、IO、流量等),若单独的针对每条时序曲线做建模,要手写太多重复的SQL,且对平台的计算消耗特别大。该如何更好的应用SQL实现上述的场景需求?

    1.2 场景二

    针对系统中的N条时序曲线进行异常检测后,有要如何快速知道:这其中有哪些时序曲线是有异常的呢?

    2. 平台实验

    2.1 解决一

    针对场景一中描述的问题,我们给出如下的数据约束。其中数据在日志服务的LogStore中按照如下结构存储:

    timestamp : unix_time_stamp
    machine: name1
    metricName: cpu0
    metricValue: 50
    ---
    timestamp : unix_time_stamp
    machine: name1
    metricName: cpu1
    metricValue: 50
    ---
    timestamp : unix_time_stamp
    machine: name1
    metricName: mem
    metricValue: 50
    ---
    timestamp : unix_time_stamp
    machine: name2
    metricName: mem
    metricValue: 60
    

    在上述的LogStore中我们先获取N个指标的时序信息:

    * | select timestamp - timestamp % 60 as time, machine, metricName, avg(metricValue) from log group by time, machine, metricName
    

    现在我们针对上述结果做批量的时序异常检测算法,并得到N个指标的检测结果:

    * | 
    select machine, metricName, ts_predicate_aram(time, value, 5, 1, 1) as res from  ( 
        select
            timestamp - timestamp % 60 as time, 
            machine, metricName, 
            avg(metricValue) as value
        from log group by time, machine, metricName )
    group by machine, metricName
    

    通过上述SQL,我们得到的结果的结构如下

    | machine | metricName | [[time, src, pred, upper, lower, prob]] |
    | ------- | ---------- | --------------------------------------- |
    

    针对上述结果,我们利用矩阵转置操作,将结果转换成如下格式,具体的SQL如下:

    * | 
    select 
        machine, metricName, 
        res[1] as ts, res[2] as ds, res[3] as preds, res[4] as uppers, res[5] as lowers, res[6] as probs
    from ( select machine, metricName, array_transpose(ts_predicate_aram(time, value, 5, 1, 1)) as res from  ( 
        select
            timestamp - timestamp % 60 as time, 
            machine, metricName, 
            avg(metricValue) as value
        from log group by time, machine, metricName )
    group by machine, metricName )
    

    经过对二维数组的转换后,我们将每行的内容拆分出来,得到符合预期的结果,具体格式如下:

    | machine | metricName | ts | ds | preds | uppers | lowers | probs |
    | ------- | ---------- | -- | -- | ----- | ------ | ------ | ----- |
    

    2.2 解决二

    针对批量检测的结果,我们该如何快速的将存在特定异常的结果过滤筛选出来呢?日志服务平台提供了针对异常检测结果的过滤操作。

    select ts_anomaly_filter(lineName, ts, ds, preds, probs, nWatch, anomalyType)
    

    其中,针对anomalyType有如下说明:

    • 0:表示关注全部异常
    • 1:表示关注上升沿异常
    • -1:表示下降沿异常

    其中,针对nWatch有如下说明:

    • 表示从实际时序数据的最后一个有效的观测点开始到最近nWatch个观测点的长度。

    具体使用如下所示:

    * | 
    select 
        ts_anomaly_filter(lineName, ts, ds, preds, probs, cast(5 as bigint), cast(1 as bigint))
    from
    ( select 
        concat(machine, '-', metricName) as lineName, 
        res[1] as ts, res[2] as ds, res[3] as preds, res[4] as uppers, res[5] as lowers, res[6] as probs
    from ( select machine, metricName, array_transpose(ts_predicate_aram(time, value, 5, 1, 1)) as res from  ( 
        select
            timestamp - timestamp % 60 as time, 
            machine, metricName, 
            avg(metricValue) as value
        from log group by time, machine, metricName )
    group by machine, metricName ) )
    

    通过上述结果,我们拿到的是一个Row类型的数据,我们可以使用如下方式,将具体的结构提炼出来:

    * | 
    select 
        res.name, res.ts, res.ds, res.preds, res.probs 
    from
        ( select 
            ts_anomaly_filter(lineName, ts, ds, preds, probs, cast(5 as bigint), cast(1 as bigint)) as res
        from
            ( select 
                concat(machine, '-', metricName) as lineName, 
                res[1] as ts, res[2] as ds, res[3] as preds, res[4] as uppers, res[5] as lowers, res[6] as probs
              from ( 
                    select 
                        machine, metricName, array_transpose(ts_predicate_aram(time, value, 5, 1, 1)) as res 
                    from  ( 
                        select
                            timestamp - timestamp % 60 as time, 
                            machine, metricName, avg(metricValue) as value
                        from log group by time, machine, metricName )
                    group by machine, metricName ) ) )
    

    通过上述操作,就可以实现对批量异常检测的结果进行过滤处理操作,帮助用户更好的批量设置告警。

    3.硬广时间

    3.1 日志进阶

    这里是日志服务的各种功能的演示 日志服务整体介绍,各种Demo

    更多日志进阶内容可以参考:日志服务学习路径



    本文作者:悟冥

    阅读原文

    本文为云栖社区原创内容,未经允许不得转载。

    相关文章

      网友评论

        本文标题:SLS机器学习最佳实战:批量时序异常检测

        本文链接:https://www.haomeiwen.com/subject/uwwwcctx.html