1.使用SQL文件创建一张表:
hive -f create_table
2.将外部数据加载到一张表里面:
LOAD DATA LOCAL INPATH '/usr/local/src/mr_wordcount/hive_test/a.txt' OVERWRITE INTO TABLE u_info;
3.两张表作join操作:
select a.*,b.* from w_a a join w_b b on a.userid = b.userid;
4.从hdfs上导入数据到hive上:
LOAD DATA INPATH '/test.txt' OVERWRITE INTO TABLE u_info;
5.从一张已知表中选择部分数据插入到一个空表中:
insert into table u_info select * from w_a limit 3;
6.创建一张新表并将已知表中的部分数据插入到新建的这张表中:
create table u_info as select * from w_a;
7.将数据从hive上导出到本地:
insert overwrite local directory '/usr/local/src/wc_input/a.txt' select * from w_a;
8.将数据从hive上导出到hdfs上:
insert overwrite directory '/a.txt' select * from w_a;
9.在hive中创建一张表的时候,同时为某一个字段指定为partition,这里其实还是创建了一个有三个字段的表,只是username被当作partition掉了而已:
create table w_a
(
userid STRING,
password STRING
)
PARTITION BY (dt STRING)
ROW FORMAT DELEMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n';
执行数据插入操作:
LOAD DATA LOCAL INPATH '/usr/local/src/a_log.txt' OVERWRITE INTO TABLE w_a partition(dt='20170303');
网友评论