美文网首页
ClickHouse / Hive 数组交、并、差计算

ClickHouse / Hive 数组交、并、差计算

作者: 光剑书架上的书 | 来源:发表于2022-03-17 20:42 被阅读0次

    Hive

    Hive 计算数组交并差函数

    select
        array_intersect(array(1, 2), array(2, 3)) i,
        array_union(array(1, 2), array(2, 3)) u,
        array_except(array(1, 2), array(2, 3)) e;
    

    SQL 实例:

    select
        size(t.res) as cnt
    from
        (
            select
                array_intersect(
                    (
                        select
                            collect_set(user_id)
                        from
                            ecom_kunlun.crm_s_u
                        where
                            date = '20220315'
                            and shop_id = 7571122
                            and pay_90d > '0'
                    ),
                    (
                        select
                            collect_set(user_id)
                        from
                            ecom_kunlun.crm_s_u
                        where
                            date = '20220315'
                            and shop_id = 7571122
                            and member_level in ('1', '2', '3', '4', '5', '6')
                    )
                ) as res
        ) t;
    

    ClickHouse

    select
        length(
            arrayDistinct(arrayFilter(x -> x is not null, t.res))
        ) as cnt
    from
        (
            select
                arrayIntersect(
                    (
                        select
                            groupUniqArray(user_id)
                        from
                            ecom_dmp_shop_user.crm_s_u
                        where
                            p_date = '2022-03-15'
                            and shop_id = 7571122
                            and pay_90d > '0'
                    ),
                    (
                        select
                            groupUniqArray(user_id)
                        from
                            ecom_dmp_shop_user.crm_s_u
                        where
                            p_date = '2022-03-15'
                            and shop_id = 7571122
                            and member_level in ('1', '2', '3', '4', '5', '6')
                    )
                ) as res
        ) t;
    

    ClickHouse 实现数组交并差计算

    计算数组交集

    函数:arrayIntersect(arr1,arr2,...)
    功能说明:计算 arr1,arr2等数组元素交集。
    返回值:交集元素子数组。
    SQL 实例:

    SELECT
        arrayIntersect([1, 2, 3], [4, 5, 6]) AS noIntersect,
        arrayIntersect([1, 2, 3], [2, 3, 4, 5, 6]) AS hasIntersect
    FORMAT Vertical
    

    输出:
    noIntersect: []
    hasIntersect: [3,2]

    计算数组并集

    组合使用函数来实现arrayDistinct(arrayConcat(a, b))。
    SQL 实例:

    SELECT
        [1, 2] AS a,
        [2, 3] AS b,
        arrayDistinct(arrayConcat(a, b)) AS res
    FORMAT Vertical
    

    输出:
    a: [1,2]
    b: [2,3]
    res: [1,2,3]

    计算数组差集

    差集的实现要有一些技术含量了(感觉 ClickHouse 后面应该内置数组差集计算函数,实现类似arrayExcept() 函数),需要使用数组交集函数arrayIntersect() 结合高阶函数 arrayMap()和 arrayFilter()来组合实现。
    SQL 实例:

    SELECT
        arrayIntersect([1, 2, 3], [4, 5, 6]) AS noIntersect,
        arrayIntersect([1, 2, 3], [2, 3, 4, 5, 6]) AS hasIntersect
    FORMAT Vertical
    SELECT
        [1, 2] AS a,
        [2, 3] AS b,
        arrayFilter(x -> (x IS NOT NULL), arrayMap(x -> multiIf(x NOT IN arrayIntersect(a, b), x, NULL), a)) AS res
    FORMAT Vertical
    

    输出:
    a: [1,2]
    b: [2,3]
    res: [1]

    使用 交(INTERSECT)、并(UNION)、差(EXCEPT)的SQL子句关键字 实现

    另外, ClickHouse 中有集合交(INTERSECT)、并(UNION)、差(EXCEPT)的SQL子句关键字,可以实现数组的交并差运算。实例 SQL 如下。

    交集SQL:

    SELECT a.i
    FROM
    (
        SELECT arrayJoin([1, 2]) AS i
    ) AS a
    INTERSECT
    SELECT b.i
    FROM
    (
        SELECT arrayJoin([2, 3]) AS i
    ) AS b
    

    输出:2

    并集 SQL:

    SET union_default_mode = 'ALL';
    SELECT DISTINCT t.i
    FROM
    (
        SELECT a.i
        FROM
        (
            SELECT arrayJoin([1, 2]) AS i
        ) AS a
        UNION
        SELECT b.i
        FROM
        (
            SELECT arrayJoin([2, 3]) AS i
        ) AS b
    ) AS t
    

    输出:
    1
    2
    3

    差集 SQL:

    SELECT a.i
    FROM
    (
        SELECT arrayJoin([1, 2]) AS i
    ) AS a
    EXCEPT
    SELECT b.i
    FROM
    (
        SELECT arrayJoin([2, 3]) AS i
    ) AS b
    

    输出:1

    上述[1,2]和[2,3]分别表示要求差集的数组

    数据函数解释:
    arrayIntersect表示多个数组求交集
    arrayConcat表示连接多个数组为一个数组
    arrayDistinct表示将多个数组元素去重组合为一个数组
    arrayMap表示使用lambda函数转换
    arrayFilter表示使用lambda函数过滤

    参考资料:

    https://clickhouse.tech/docs/zh/sql-reference/functions/higher-order-functions/

    相关文章

      网友评论

          本文标题:ClickHouse / Hive 数组交、并、差计算

          本文链接:https://www.haomeiwen.com/subject/xflbdrtx.html