压缩
一般用orc或者parquet
orc
create table log_orc(
track_time string,
url string,
session_id string,
referer string,
ip string,
end_user_id string,
city_id string )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS orc ;
结尾加上STORED AS orc
,同理,用Parquet模式我们加上STORED AS PARQUET ;
ORC存储指定压缩方式
create table log_orc_snappy(
track_time string,
url string,
session_id string,
referer string,
ip string,
end_user_id string,
city_id string )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS orc
tblproperties ("orc.compress"="SNAPPY");
一般SNAPPY压缩和解压缩比比较高,所以一般如果压缩就用snappy,结尾加上tblproperties ("orc.compress"="SNAPPY");
即可
存储方式和压缩总结
在实际的项目开发当中,hive表的数据存储格式一般选择:orc或parquet。压缩方式一般选 择snappy。
网友评论