HIVE常用命令之ANALYZE TABLE命令简述

作者: 润土1030 | 来源:发表于2019-03-01 11:31 被阅读6次

HIVE常用命令之ANALYZE TABLE命令简述
IMPALA常用命令COMPUTE STATS简述
HIVE常用命令之MSCK REPAIR TABLE命令简述
Hive操作命令及Hbase Shell命令
hive 分区修复
SQL常用命令书目录
mysql常用命令书目录
mysql如何删除带有自增长的主键
Sqoop往hive中导入导出数据的常用命令
SVN常用命令书目录

上次讲过HIVE 的一个常用命令 MSCK REPAIR TABLE ，这次讲讲HIVE的ANALYZE TABLE命令，接下来还会讲下Impala的 COMPUTE STATS 命令。这几个命令都是用来统计表的信息的，用于加速查询。

HIVE的 ANALYZE TABLE 命令是做啥的

Statistics such as the number of rows of a table or partition and the histograms of a particular interesting column are important in many ways. One of the key use cases of statistics is query optimization. Statistics serve as the input to the cost functions of the optimizer so that it can compare different plans and choose among them. Statistics may sometimes meet the purpose of the users' queries. Users can quickly get the answers for some of their queries by only querying stored statistics rather than firing long-running execution plans. Some examples are getting the quantile of the users' age distribution, the top 10 apps that are used by people, and the number of distinct sessions.

其实主要就是为了优化查询，加快查询的速度。

HIVE的主要统计数据

1. 表和分区的统计数据 (Table and Partition Statistics):

行数（Number of rows）
文件数（Number of files）
文件大小（Size in Bytes）

2. 列的统计数据 (Column Statistics): 这是Hive 0.10.0后加入的

极大值等信息

3. Top K Statistics

语法

analyze table t [partition p] compute statistics for [columns c,...];

例子1 对hive进行表和分区的分析

#执行语句
ANALYZE TABLE dw_wy_srvcc_customized_drilldown_table_daily PARTITION(date_time='20190228') COMPUTE STATISTICS;
#查看结果
DESCRIBE extended dw_wy_srvcc_customized_drilldown_table_daily PARTITION(date_time='20190228');

image.png

结果如下 numRows=12552, totalSize=1564884, rawDataSize=1552332

Partition(values:[20190228], dbName:coordinator_test, tableName:dw_wy_srvcc_customized_drilldown_table_daily, createTime:1551344149, lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:level, type:bigint, comment:null), FieldSchema(name:pi_name, type:string, comment:null), FieldSchema(name:dimension_type, type:string, comment:null), FieldSchema(name:demension_item, type:string, comment:null), FieldSchema(name:numerator_value, type:double, comment:null), FieldSchema(name:denominator_value, type:double, comment:null), FieldSchema(name:pi_value, type:double, comment:null), FieldSchema(name:degrad_weight, type:double, comment:null), FieldSchema(name:date_time, type:string, comment:null)], location:hdfs://dlbdn3:8020/user/hive/warehouse/coordinator_test.db/dw_wy_srvcc_customized_drilldown_table_daily/date_time=20190228, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=,, field.delim=,}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), parameters:{numFiles=20, COLUMN_STATS_ACCURATE=true, transient_lastDdlTime=1551408396, numRows=12552, totalSize=1564884, rawDataSize=1552332})

例子2 对hive进行列的分析

#执行语句
analyze table dw_wy_srvcc_customized_drilldown_table_daily compute statistics for columns level;
#查看结果
describe formatted dw_wy_srvcc_customized_drilldown_table_daily level;

image.png

我这面就不造大量数据了，有兴趣的可以试下，对比下使用analyze table前后的查询速度，会快很多的。其实如果是IMPALA和HIVE结合的话，用IMPALA的COMPUTE STAT会更好。

网友评论

本文标题：HIVE常用命令之ANALYZE TABLE命令简述

本文链接：https://www.haomeiwen.com/subject/rsbruqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

HIVE常用命令之ANALYZE TABLE命令简述

HIVE的 ANALYZE TABLE 命令是做啥的

HIVE的主要统计数据

1. 表和分区的统计数据 (Table and Partition Statistics):

2. 列的统计数据 (Column Statistics): 这是Hive 0.10.0后加入的

3. Top K Statistics

语法

例子1 对hive进行表和分区的分析

例子2 对hive进行列的分析

我这面就不造大量数据了，有兴趣的可以试下，对比下使用analyze table前后的查询速度，会快很多的。其实如果是IMPALA和HIVE结合的话，用IMPALA的COMPUTE STAT会更好。

相关文章

HIVE常用命令之ANALYZE TABLE命令简述

IMPALA常用命令COMPUTE STATS简述

HIVE常用命令之MSCK REPAIR TABLE命令简述

Hive操作命令及Hbase Shell命令

hive 分区修复

SQL常用命令书目录

mysql常用命令书目录

mysql如何删除带有自增长的主键

Sqoop往hive中导入导出数据的常用命令

SVN常用命令书目录

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

HIVE和IMPALA技巧介绍

Hive