hive>create database if not exists db_hive;
hive>desc database extended db_hive;
如果数据库不为空,可以采用cascade命令,强制删除
hive> drop database db_hive cascade;
创建分区表和相关操作:
建表语法:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...)
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
实例:
create external table if not exists student(
id int,
name string,
classNo string)
partitioned by (entry_year string)
row format delimited fields terminated by '\t'
stored as textfile
加载本地数据:不会删除源数据
hive> load data local inpath '/root/testdata/student.txt' into table student partition(entry_year=2018);
加载hdfs中的数据:一定注意这种方式会把原有数据剪切到对应的hdfs目录,如果和别人的表共享一个数据源,那么就不能这么操作,而要通过alter table student add partition(entry_year='sss') location '/testdata'的方式
hive> load data inpath '/testdata/student' into table student partition(entry_year=2019);
也可以通过上传数据到对应的hdfs文件目录,再动态添加分区:
[root@mini01 testdata]# hadoop fs -mkdir -p /user/hive/warehouse/db_hive.db/student/entry_year=2020
[root@mini01 testdata]# hadoop fs -put /root/testdata/student.txt /user/hive/warehouse/db_hive.db/student/entry_year=2020
hive> alter table student add partition(entry_year='2020');
如果元数据在hdfs上的某个目录(不是在hive元数据目录),用下面语句,注意location后面是目录而不是文件,原有数据不会更改,但是如果原有数据丢失,那么对应表中数据也就丢失了,如果。
hive> alter table student drop if exists partition(entry_year='2021');
hive> alter table student add partition(entry_year='2021') location '/testdata';
删除分区:
alter table dept_partition drop partition (month='201705'), partition (month='201706');
数据导入方式:
语法:
hive>load data [local] inpath '/opt/module/datas/student.txt' [overwrite] into table student [partition (partcol1=val1,…)];
也可以通过读其他表到目标表
hive> insert overwrite table student partition(entry_year='2022') select id,name,classno from student where entry_year='2021'
网友评论