有一次在 hive shell 中向hive 表插入数据
时报了如下错误:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
Unable to determine if hdfs://nameservice1:9001/user/hive/warehouse/test.db/tbl1 is encrypted: java.lang.IllegalArgumentException: Wrong FS: hdfs://nameservice1:9000/user/hive/warehouse/test.db/tbl1, expected: hdfs://127.0.0.1:9001
后经过仔细排查,才发现了原因所在。
一、原因分析
hive 通常使用 mysql 数据库作为metastore(存储metadata),当初在配置hive的时候,需要在 mysql中新建一个数据库(比如名为hive
)并创建了对应的user/password。关于hive 的metastore配置可以参考博客:https://www.jianshu.com/p/ce4c5826a078。
在mysql中的名为hive的数据库
中包含了两张表:DBS 和 SDS
,其记录了查询引擎 hive 中的所有数据库的信息(数据存储在hdfs中的location、user等)。
接下来划重点:
当时配置hive的时候,DBS 和 SDS
中记录的location
是HDFS的配置项fs.defaultFS的值为:
<property>
<name>fs.defaultFS</name>
<value>hdfs://nameservice1:9000</value>
</property>
但是之后做了修改:
<property>
<name>fs.defaultFS</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
因此,这就导致 hive metastore中的信息
和表在HDFS真实存储路径信息
不一致,进而导致了本文所提到的错误。
二、解决方法
因此,只需要修改hive metastore中的信息
,使其和表在HDFS真实存储路径信息
一致即可。为此修改mysql中hive数据库的两张表的内容:
-
DBS 表的DB_LOCATION_URI
字段; -
SDS 表的LOCATION
字段。
注意:修改为hadoop配置文件hdfs-site.xml
中的配置项fs.defaultFS
对应的值。
可以在mysql中执行如下命令:
update DBS set DB_LOCATION_URI=REPLACE (DB_LOCATION_URI, 'nameservice1:9000', '127.0.0.1:9000');
update SDS set LOCATION=REPLACE (LOCATION,'nameservice1:9000', '127.0.0.1:9000');
DBS表修改前:
mysql> select * from DBS;
+-------+-----------------------+---------------------------------------------------------------------------------+------------------------------------+------------+------------+
| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+---------------------------------------------------------------------------------+------------------------------------+------------+------------+
| 1 | Default Hive database | hdfs://nameservice1:9000/user/hive/warehouse | default | public | ROLE |
| 6 | NULL | hdfs://nameservice1:9000/user/hive/warehouse/test.db | test | user1 | USER |
+-------+-----------------------+---------------------------------------------------------------------------------+------------------------------------+------------+------------+
DBS表修改后:
mysql> select * from DBS;
+-------+-----------------------+---------------------------------------------------------------------------------+------------------------------------+------------+------------+
| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+---------------------------------------------------------------------------------+------------------------------------+------------+------------+
| 1 | Default Hive database | hdfs://127.0.0.1:9000/user/hive/warehouse | default | public | ROLE |
| 6 | NULL | hdfs://127.0.0.1:9000/user/hive/warehouse/test.db | test | user1 | USER |
+-------+-----------------------+---------------------------------------------------------------------------------+------------------------------------+------------+------------+
三、测试结果
重启hive shell,并重新向hive 表中插入数据(通过mapreduce),成功了。
hive> use test;
hive> describe tbl1;
OK
id int
name string
gender string
age int
Time taken: 0.154 seconds, Fetched: 4 row(s)
hive> insert into tbl1 values(1, 'a', 'm', 1);
Query ID = ......
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-11-02 16:09:44,943 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local32255317_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://.../test.db/tbl1/.hive-staging_hive_2019-11-02_16-09-41_685_8562247153682161957-1/-ext-10000
......
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 8 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 3.609 seconds
笔者水平有限,如有错误,敬请指正!
网友评论