美文网首页
clickhouse-物化视图

clickhouse-物化视图

作者: 越狱的灵感 | 来源:发表于2022-06-09 10:13 被阅读0次

    参考文档

    https://clickhouse.tech/docs/en/sql-reference/statements/create/view/#

    介绍

    物化视图可以理解为一个预聚合触发器,数据在控制好触发的汇聚条件,几乎是实时的
    物化视图会存储一份计算好的聚合数据,是一种空间换时间的绝妙方法,对集群的稳定性和很重要。


    1.png

    使用

    物化视图的建立有两种方法
    1,使用TO关键字(推荐使用),可以控制TTL,不能使用POPULATE
    例:

    --创建本地存储表,物化视图本地表和普通的建表没啥区别,inner_物化视图本地表前缀
    create table IF NOT EXISTS demo_db.inner_demo_table on CLUSTER demo0201oddp_v1 (
        `day` date default toDate(now())
        ,`its` datetime default now()
        ,`channel` String
        ,`appName` String
        ,`stream` String
        ,`requestUrl` String
        ) ENGINE = ReplicatedReplacingMergeTree (
        '/clickhouse/tables/{cluster_demo0201oddp_v1}/{shard_demo0201oddp_v1}/inner_demo_table'
        ,'{replica_demo0201oddp_v1}'
        ) partition by toYYYYMMDD (its)
    order by (channel,appName,stream)  TTL its + toIntervalDay(20) SETTINGS index_granularity = 8192,storage_policy = 'demo_default_policy'
    
    --创建物化视图,mview_ 物化视图前缀,to 后面接本地表,as 后面添加计算逻辑
    create MATERIALIZED view demo_db.mview_demo_table on CLUSTER demo0201oddp_v1 to demo_db.inner_demo_table as
    select channel
        ,appName
        ,stream
        ,requestUrl
    from demo_db.inner_demo_table
    group by channel
        ,appName
        ,stream
        ,requestUrl
    

    2,使用默认表
    此方案建议是数据量小的表,因为无法控制TTL,后期数据运维不方便。默认存储表在clickhouse中是.inner_id.uuid值作为表名
    例:

    --物化视图原始数据表
    create table IF NOT EXISTS demo_db.inner_demo_table ON CLUSTER demo0201oddp_v1 (
      `day` date default toDate(now()),
      `its` datetime default now(),
      `_cnt` UInt8 default 1,
      `channel` String,
      `eventTime` UInt32,
      `clientId` UInt32,
      `scheduleStr` String,
      `ispCode` UInt32,
      `hitFlag` Int16,
      `reqCnt` UInt32,
      `fileSize` UInt64,
      `clientIp` String,
      `guid` String,
      `status` String
    ) ENGINE = ReplicatedMergeTree (
      '/clickhouse/tables/{cluster1}/{shard}/inner_demo_table',
      '{replica}'
    ) partition by toYYYYMMDD (toDate(eventTime))
    order by
      its TTL its + INTERVAL 1 hour SETTINGS index_granularity = 8192
    
    --使用默认存储表
    CREATE MATERIALIZED VIEW IF NOT EXISTS demo_db.mview_inner_demo_table_uv ON CLUSTER demo0201oddp_v1 ENGINE = ReplicatedAggregatingMergeTree(
      '/clickhouse/tables/{cluster1}/{shard}/mview_inner_demo_table_uv',
      '{replica}'
    ) PARTITION BY t_day
    ORDER BY
      (channel) SETTINGS index_granularity = 8192,
      use_minimalistic_part_header_in_zookeeper = 1 AS
    SELECT
      sumState(1) as uv,
      day as t_day,
      toStartOfHour(toDateTime(eventTime)) as __time,
      channel,
      clientIp
    FROM
      demo_db.inner_demo_table
    GROUP BY
      channel,
      clientIp,
      day,
      __time;
    

    总结

    1,物化视图是一种空间换时间的预聚合方式,聚合后的数据将存储在新表中,一般于SummingMergeTree,AggregatingMergeTree等聚合引擎一起使用。

    2,物化视图因为是写入触发器,所以as select只对每批次的insert data有效果,所以即使是where条件也是对这批写入数据起效果(https://clickhouse.tech/docs/en/sql-reference/statements/create/view/#materialized

    2.png
    3,物化视图只有在原表insert的时候才会触发

    4,POPULATE关键字,不建议使用,会把原始表中的已存在数据全部物化一遍,老数据的同步,建议直接insert到mv中

    5,多表join生成物化视图,左表插入数据时才更新

    6,源表数据的改变不会影响物化视图,如update, delete, drop partition

    相关文章

      网友评论

          本文标题:clickhouse-物化视图

          本文链接:https://www.haomeiwen.com/subject/rdujmrtx.html