1、优化count的时间
调整hive.compute.query.using.stats 参数,这一在执行count操作的时候直接去查询元数据而不用再去执行计算
它的解释如下
<description>
When set to true Hive will answer a few queries like count(1) purely using stats
stored in metastore. For basic stats collection turn on the config hive.stats.autogather to true.
For more advanced stats collection need to run analyze table queries.
</description>
有些表是存储这有一个表有多少行的(numRows字段),具体可以使用 desc formatted tableName 来进行查询,如下所示
| Table Parameters: | NULL | NULL |
| | COLUMN_STATS_ACCURATE | {\"BASIC_STATS\":\"true\"} |
| | numFiles | 9 |
| | numRows | 12196178 |
| | rawDataSize | 8878817584 |
| | totalSize | 785739671 |
| | transient_lastDdlTime | 1595852757 |
网友评论