load data
load data的实质是把文件复制或移到表的目录中,从而把数据导入Hive的表(或分区)。
insert overwrite
这种插入会覆盖原有数据。
insert overwrite table target_table partition ( dt = '20190402') select col1, col2 from source_table;
insert into
这种插入会追加数据。
insert into table target_table partition (dt = '20190402') select col1, col2 from source_table;
动态分区插入
insert overwrite table target_table partition(dt) select col1, col2, dt from source_table;
通常情况下,动态分区插入是关闭的,要开启需要设置 hive.exec.dynamic.partition为true。
多表插入
更加清晰的插入sql写法:
from source_table
insert overwrite table target_table select col1, col2;
如果从源表读取数据插入到多个表中,可以在上述语句中使用多个insert语句,这样效率更高,因为只需要扫描一遍源表即可。
from source_table
insert overwrite table target_table1 select col1;
insert overwrite table target_table2 select col2;
create table ... as select (CTAS)
新表的列的定义和select查询的源表列保持一致。
create table new_table as select col1, col2 from source_table;
hive> create table new_table as select id, name from partition_table;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20190402221824_a9feb375-8dd1-4f9c-aaba-015f6ab58cb2
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1552651623473_0006, Tracking URL = http://jms-master-01:8088/proxy/application_1552651623473_0006/
Kill Command = /home/hadoop/tools/hadoop-2.7.7/bin/hadoop job -kill job_1552651623473_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-04-02 22:18:31,075 Stage-1 map = 0%, reduce = 0%
2019-04-02 22:18:36,347 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.48 sec
MapReduce Total cumulative CPU time: 2 seconds 480 msec
Ended Job = job_1552651623473_0006
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://jms-master-01:9000/user/hive/warehouse/.hive-staging_hive_2019-04-02_22-18-24_643_3904332483007604326-1/-ext-10002
Moving data to directory hdfs://jms-master-01:9000/user/hive/warehouse/new_table
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.48 sec HDFS Read: 3896 HDFS Write: 171 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 480 msec
OK
Time taken: 13.836 seconds
hive> select * from new_table;
OK
1 郭靖
2 黄蓉
3 杨康
4 穆念慈
5 东邪
6 西毒
7 黄老邪
8 杨铁心
Time taken: 0.146 seconds, Fetched: 8 row(s)
hive> desc new_table;
OK
id int
name string
Time taken: 0.197 seconds, Fetched: 2 row(s)
CTAS是原子操作,如果select查询失败,则新表不会被创建。
网友评论