美文网首页HIVE和IMPALA技巧介绍
IMPALA常用命令COMPUTE STATS简述

IMPALA常用命令COMPUTE STATS简述

作者: 润土1030 | 来源:发表于2019-03-01 15:34 被阅读12次

    前面介绍了HIVE的ANALYZE TABLE命令, IMPALA也提供了一个类似的命令叫COMPUTE STATS。这篇文章就是讲讲这个命令。

    IMPALA的COMPUTE STATS是做啥的

    Gathers information about volume and distribution of data in a table and all associated columns and partitions. The information is stored in the metastore database, and used by Impala to help optimize queries. For example, if Impala can determine that a table is large or small, or has many or few distinct values it can organize parallelize the work appropriately for a join query or insert operation. For details about the kinds of information gathered by this statement, see Table and Column Statistics.

    和HIVE的ANALYZE TABLE类似,这个命令主要也是为了优化查询,加快查询的速度。本来IMPALA是依靠HIVE的ANALYZE TABLE的,但是这个命令不是很好用同时不稳定,所以IMPALA自己实现了个命令完成相同功能。

    语法

    #全量
    COMPUTE STATS [db_name.]table_name
    #增量
    COMPUTE INCREMENTAL STATS [db_name.]table_name [PARTITION (partition_spec)]
    

    例子

    SHOW PARTITIONS dw_wy_video_kqi_cell_hourly;
    COMPUTE INCREMENTAL STATS dw_wy_video_kqi_cell_hourly PARTITION (date_time='2019022817');
    SHOW PARTITIONS dw_wy_video_kqi_cell_hourly;
    

    效果如下,没有用过COMPUTE INCREMENTAL STATS的分区是 -1


    image.png

    执行COMPUTE STATS dw_wy_video_kqi_cell_hourly语句之前的效果,可以看到有很多分区的数据并未统计

    image.png

    执行COMPUTE STATS dw_wy_video_kqi_cell_hourly后的效果

    image.png

    相关文章

      网友评论

        本文标题:IMPALA常用命令COMPUTE STATS简述

        本文链接:https://www.haomeiwen.com/subject/plvduqtx.html