本节主要内容:
impala使用
impala建表
一.直接建表
# sudu -u hdfs impala-shell
[not connected] > connect node1.hadoop.com:21000
[node1.hadoop.com:21000] >
[node1.hadoop.com:21000] >
[node1.hadoop.com:21000] > DROP TABLE IF EXISTS student;
[node1.hadoop.com:21000] > DROP TABLE IF EXISTS student;
[node1.hadoop.com:21000] > CREATE TABLE student
> (
> id INT,
> name STRING
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
[node1.hadoop.com:21000] > SHOW TABLES;
单条插入(hive不支持单条插入)
[node1.hadoop.com:21000] > insert into student values(1,'zhangsan');
Query: insert into student values(1,'zhangsan')
Query submitted at: 2020-07-05 22:12:20 (Coordinator: http://node1.hadoop.com:25000)
Query progress can be monitored at: http://node1.hadoop.com:25000/query_plan?query_id=b34e63473dd63cfe:7368e1bc00000000
Modified 1 row(s) in 0.67s
[node1.hadoop.com:21000] > select * from student;
Query: select * from student
Query submitted at: 2020-07-05 22:12:29 (Coordinator: http://node1.hadoop.com:25000)
Query progress can be monitored at: http://node1.hadoop.com:25000/query_plan?query_id=b54fb8a8284e60fe:cf6aab0300000000
+----+----------+
| id | name |
+----+----------+
| 1 | zhangsan |
+----+----------+
Fetched 1 row(s) in 0.45s
[node1.hadoop.com:21000] >
查看存储位置及创建者(Node1节点)
# sudo -u hdfs hadoop fs -ls /user/hive/warehouse
Found 9 items
...
drwxrwxrwt - impala supergroup 0 2020-07-05 22:12 /user/hive/warehouse/student
...
# sudo -u hdfs hadoop fs -ls /user/hive/warehouse/student
Found 2 items
drwxrwxrwt - impala supergroup 0 2020-07-05 22:12 /user/hive/warehouse/student/_impala_insert_staging
-rw-r--r-- 3 impala supergroup 11 2020-07-05 22:12 /user/hive/warehouse/student/b34e63473dd63cfe-7368e1bc00000000_1806132997_data.0.
以上可见impala是基于hive的。
二.通过文件直接创建表
创建文件夹
# sudo -u hdfs hdfs dfs -mkdir -p /user/cloudera/sample_data/tab1 /user/cloudera/sample_data/tab2
配置权限
# sudo -u hdfs hadoop fs -chmod -R 777 /user/cloudera/sample_data
创建测试文件
# vi /tab1.csv
插入以下内容
1,true,123.123,2012-10-24 08:55:00
2,false,1243.5,2012-10-25 13:40:00
3,false,24453.325,2008-08-22 09:33:21.123
4,false,243423.325,2007-05-12 22:32:21.33454
5,true,243.325,1953-04-22 09:11:33
# vi /tab2.csv
1,true,12789.123
2,false,1243.5
3,false,24453.325
4,false,2423.3254
5,true,243.325
60,false,243565423.325
70,true,243.325
80,false,243423.325
90,true,243.325
上传测试文件
# cd /
# sudo -u hdfs hdfs dfs -put tab1.csv /user/cloudera/sample_data/tab1
# sudo -u hdfs hdfs dfs -put tab2.csv /user/cloudera/sample_data/tab2
进入impala-shell,创建表1和表2
impala使用
# sudu -u hdfs impala-shell
[not connected] > connect node1.hadoop.com:21000
[node1.hadoop.com:21000] >
[node1.hadoop.com:21000] >
[node1.hadoop.com:21000] > DROP TABLE IF EXISTS tab1;
[node1.hadoop.com:21000] > CREATE EXTERNAL TABLE tab1
> (
> id INT,
> col_1 BOOLEAN,
> col_2 DOUBLE,
> col_3 TIMESTAMP
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> LOCATION '/user/cloudera/sample_data/tab1'; 数据文件的读取位置
[node1.hadoop.com:21000] > DROP TABLE IF EXISTS tab2;
[node1.hadoop.com:21000] > CREATE EXTERNAL TABLE tab2
> (
> id INT,
> col_1 BOOLEAN,
> col_2 DOUBLE
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> LOCATION '/user/cloudera/sample_data/tab2';
查看:
[node1.hadoop.com:21000] > select * from tab1;
Query: select * from tab1
Query submitted at: 2020-07-05 22:19:16 (Coordinator: http://node1.hadoop.com:25000)
Query progress can be monitored at: http://node1.hadoop.com:25000/query_plan?query_id=5a4837a6f2f94e84:e372078c00000000
+----+-------+------------+-------------------------------+
| id | col_1 | col_2 | col_3 |
+----+-------+------------+-------------------------------+
| 1 | true | 123.123 | 2012-10-24 08:55:00 |
| 2 | false | 1243.5 | 2012-10-25 13:40:00 |
| 3 | false | 24453.325 | 2008-08-22 09:33:21.123000000 |
| 4 | false | 243423.325 | 2007-05-12 22:32:21.334540000 |
| 5 | true | 243.325 | 1953-04-22 09:11:33 |
+----+-------+------------+-------------------------------+
Fetched 5 row(s) in 0.45s
[node1.hadoop.com:21000] > select * from tab2;
Query: select * from tab2
Query submitted at: 2020-07-05 22:19:25 (Coordinator: http://node1.hadoop.com:25000)
Query progress can be monitored at: http://node1.hadoop.com:25000/query_plan?query_id=9e49976674c988c3:b63f1a2100000000
+----+-------+---------------+
| id | col_1 | col_2 |
+----+-------+---------------+
| 1 | true | 12789.123 |
| 2 | false | 1243.5 |
| 3 | false | 24453.325 |
| 4 | false | 2423.3254 |
| 5 | true | 243.325 |
| 60 | false | 243565423.325 |
| 70 | true | 243.325 |
| 80 | false | 243423.325 |
| 90 | true | 243.325 |
+----+-------+---------------+
Fetched 9 row(s) in 6.26s
[node1.hadoop.com:21000] >
网友评论