ClickHouse 高阶函数

作者: 郭彦超 | 来源:发表于2019-09-25 14:48 被阅读0次

    先来一个完整的例子,该示例根据行为日志计算用户访问的top路径
    select data, count(1) cn from (
    with maxIf( c_t , cat='page_view'and act='页面浏览') as max_time, -- 目标事件时间
    arraySort(
    e -> e.1,
    arrayFilter(x->x.1<=toUInt64OrZero(max_time),groupArray((toUInt64OrZero(c_t), (cat,act) )))
    ) as sorted_array,
    -- 按时间排序后的数据
    -- arrayPushFront( sorted_array, sorted_array[1] ) as e_arr,
    arrayFilter(
    (i, e,z) -> z.1 < toUInt64OrZero(max_time)
    and (e > 1800000 or (z.2.1='page_view' and z.2.2='页面
    浏览')),
    arrayEnumerate(sorted_array), arrayDifference( sorted_array.1 ),sorted_array
    ) as arr_indx, -- 过滤目标事件、时间差后的数据
    arrayReduce('max',arr_indx) +1 as smIndx,
    arrayFilter(
    (e,i) -> i>=smIndx and e.1<=toUInt64OrZero(max_time) ,
    sorted_array, arrayEnumerate(sorted_array)
    ) as data_
    select u_i,
    arrayFilter((x,y,i)-> i=1 or i>1 and y<>0 ,data_.2,arrayDifference(arrayEnumerateDense(data_.2)),arrayEnumerate(data_)) as data__,
    arraySlice(data__,-7,7 ) as data,
    -- arrayStringConcat(data,'->') as path,
    hasAll(data, [ ('page_view','页面_浏览') ]) as has_way_point --路径中必须经过的点
    from app.scene_tracker where c_p='PC' and length(u_i)>20
    group by u_i having length(data)>1
    ) tab
    where has_way_point=1 group by data order by cn desc limit 100

    根据经验,大家如处理复杂业务,这些函数会经常用到

    neighbor

    • 获取某一列前后相邻的数据,第二个参数控制前后相邻的距离

    • 示例1:

    SELECT a, neighbor( a,-1 ) from (SELECT arrayJoin( [1,2,3,6,34,3,11] ) as a,'u' as  b)  
    
    

    arrayJoin

    • 行变列,对数组进行展开操作

    • 示例2:

    # 还是上面的例子
    SELECT a, neighbor( a,-1 ) from (SELECT arrayJoin( [1,2,3,6,34,3,11] ) as a,'u' as  b)  
    
    

    arraySort

    • 对数组进行排序,降序的话用这个 arrayReverseSort

    • 示例3:

    # 还是上面的例子 略作修改,可对比示例1和示例3的结果区别
    SELECT a, neighbor( a,-1 ) from (SELECT arrayJoin( arraySort([1,2,3,6,34,3,11]) ) as a,'u' as  b)  
    
    

    arrayFilter

    • 过滤出数组中满足条件的数据

    • 示例4:

    # 我们只获取数组中的偶数部分
    SELECT a, neighbor( a,-1 ) from (SELECT arrayJoin( arraySort(arrayFilter(x->x%2=0, [1,2,3,6,34,3,11])) ) as a,'u' as  b)  
    
    

    arrayEnumerate

    • 返回数组下标

    • 示例5:

    SELECT arrayEnumerate( [1,2,3,6,34,3,11] )
    
    

    arrayDifference

    • 计算数组中前后两个值的差值部分

    • 示例6:

    SELECT arrayDifference( [1,2,3,6,34,3,11] )
    
    

    arrayReduce

    • 对数组进行聚合操作,min 、max、avg 等

    • 示例7:

    SELECT arrayReduce('avg', [1,2,3,6,34,3,11] )
    
    

    arrayEnumerateDense

    • 标记出数组中相同的元素

    • 示例8:

    SELECT arrayEnumerateDense( [1,2,3,6,34,3,11] )
    
    

    arraySlice

    • 对数组进行切割 ,后面两个参数分别是切割的offset和切割长度
    • 示例9:
    SELECT arraySlice( [1,2,3,6,34,3,11] , -3, 2)
    # 返回:34 3
    
    

    hasAny

    • 判断数组中是否包含某些值,包含其一返回1 ,否则0 ;如果判断全部包含 用hasAll
    • 示例10:
    SELECT hasAny( [1,2,3,6,34,3,11] , [3,1])
    
    

    arrayStringConcat

    • 将数组元素按照给定分隔符进行拼接,返回拼接后的字符串
    • 示例11:
    SELECT arrayStringConcat( [1,2,3,6,34,3,11] , '-') 
    
    

    arrayPushFront

    • 向数组首位置最加value ;同理向数组末尾最加为arrayPushBack
    • 示例12:
    SELECT arrayPushFront( [1,2,3,6,34,3,11] , 8)
    
    

    arrayPopFront

    • 移除数组下标为1的值;同理,移除数组最后一个值用arrayPopBack
    • 示例13:
    SELECT arrayPopFront( [1,2,3,6,34,3,11] ) 
    
    

    arrayWithConstant

    • 生成一个指定长度的数组
    • 示例14:
    #生成长度为3 的数组
    SELECT arrayWithConstant( 3, 'a')
    #范围值为['a','a','a']
    
    

    arrayUniq

    • 计算数组中有多少个不重复的值;如进行数组去重操作 用arrayDistinct
    • 示例15:
    SELECT arrayUniq( [1,2,3,6,34,3,11]) 
    
    

    runningDifference

    • 计算某一列前后数值的差值
    • 示例16:
    select a,runningDifference(a)  from (SELECT arrayJoin( [1,2,3,6,34,3,11] ) as a,'u' as  b)
    
    

    arrayCompact

    • 对数组内数据实现相邻去重
    • 示例17:
    SELECT arrayCompact([1, 2, 2, 3, 2, 3, 3])
    #返回值为 [1,2,3,2,3]        
    
    

    开篇示例语句运行结果如下图

    image.png

    相关文章

      网友评论

        本文标题:ClickHouse 高阶函数

        本文链接:https://www.haomeiwen.com/subject/lpaiuctx.html