备注:
CDH 6.3.1
一.问题描述
如下图所示,刚安装的CDH,提示存在丢失块,也存在副本不足的块
![](https://img.haomeiwen.com/i2638478/1311d28eb516d854.png)
二.解决方案
2.1 丢失块查找
sudo -u hdfs hadoop fsck / -files -blocks
测试记录:
FSCK started by hdfs (auth:SIMPLE) from /10.31.1.123 for path / at Fri Nov 27 09:55:15 CST 2020
/ <dir>
/tmp <dir>
/tmp/.cloudera_health_monitoring_canary_files <dir>
/tmp/.cloudera_health_monitoring_canary_files/.canary_file_2020_11_19-14_47_30.2b73c248cbe340c5 0 bytes, replicated: replication=3, 0 block(s): OK
/tmp/.cloudera_health_monitoring_canary_files/.canary_file_2020_11_19-14_48_30.f8637315419b1c8e 0 bytes, replicated: replication=3, 0 block(s): OK
/tmp/hive <dir>
/tmp/hive/anonymous <dir>
/tmp/hive/anonymous/5245f213-70eb-45d2-970f-1db033fbafad <dir>
/tmp/hive/anonymous/5245f213-70eb-45d2-970f-1db033fbafad/_tmp_space.db <dir>
/tmp/hive/anonymous/83ac2b81-64c8-4a4e-832f-c1f933cea7eb <dir>
/tmp/hive/anonymous/83ac2b81-64c8-4a4e-832f-c1f933cea7eb/_tmp_space.db <dir>
/tmp/hive/anonymous/83ac2b81-64c8-4a4e-832f-c1f933cea7eb/_tmp_space.db/Values__Tmp__Table__1 <dir>
/tmp/hive/anonymous/83ac2b81-64c8-4a4e-832f-c1f933cea7eb/_tmp_space.db/Values__Tmp__Table__1/data_file 22 bytes, replicated: replication=3, 1 block(s): OK
0. BP-157751563-10.31.1.123-1605413961809:blk_1073749568_8744 len=22 Live_repl=3
/tmp/hive/hive <dir>
/tmp/hive/hive/0988ee1b-25eb-43a4-b8cd-d0d7e43a457e <dir>
/tmp/hive/hive/0988ee1b-25eb-43a4-b8cd-d0d7e43a457e/_tmp_space.db <dir>
/tmp/hive/hive/11709f3e-a78b-4f6c-9d85-214474b0a566 <dir>
/tmp/hive/hive/11709f3e-a78b-4f6c-9d85-214474b0a566/_tmp_space.db <dir>
/tmp/hive/hive/190fef6f-a879-4fe3-8861-510701ce0c62 <dir>
/tmp/hive/hive/190fef6f-a879-4fe3-8861-510701ce0c62/_tmp_space.db <dir>
/tmp/hive/hive/1d83563c-8b79-4fea-9dec-22a3d1e0454b <dir>
/tmp/hive/hive/1d83563c-8b79-4fea-9dec-22a3d1e0454b/_tmp_space.db <dir>
/tmp/hive/hive/4f781ef4-b9da-4515-84f4-ff8037dd6e23 <dir>
/tmp/hive/hive/4f781ef4-b9da-4515-84f4-ff8037dd6e23/_tmp_space.db <dir>
**中间省略N多输出**
/user/yarn <dir>
/user/yarn/mapreduce <dir>
/user/yarn/mapreduce/mr-framework <dir>
/user/yarn/mapreduce/mr-framework/3.0.0-cdh6.3.1-mr-framework.tar.gz 235053931 bytes, replicated: replication=3, 2 block(s): OK
0. BP-157751563-10.31.1.123-1605413961809:blk_1073766370_25546 len=134217728 Live_repl=3
1. BP-157751563-10.31.1.123-1605413961809:blk_1073766371_25547 len=100836203 Live_repl=3
Status: CORRUPT
Number of data-nodes: 4
Number of racks: 1
Total dirs: 2108
Total symlinks: 0
Replicated Blocks:
Total size: 128560763438 B
Total files: 2187
Total blocks (validated): 3112 (avg. block size 41311299 B)
********************************
UNDER MIN REPL'D BLOCKS: 1814 (58.29049 %)
dfs.namenode.replication.min: 1
CORRUPT FILES: 1814
MISSING BLOCKS: 1814
MISSING SIZE: 1519384904 B
********************************
Minimally replicated blocks: 1298 (41.70951 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 1.2512853
Missing blocks: 1814
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Blocks queued for replication: 0
Erasure Coded Block Groups:
Total size: 0 B
Total files: 0
Total block groups (validated): 0
Minimally erasure-coded block groups: 0
Over-erasure-coded block groups: 0
Under-erasure-coded block groups: 0
Unsatisfactory placement block groups: 0
Average block group size: 0.0
Missing block groups: 0
Corrupt block groups: 0
Missing internal blocks: 0
Blocks queued for replication: 0
FSCK ended at Fri Nov 27 09:55:15 CST 2020 in 47 milliseconds
The filesystem under path '/' is CORRUPT
过滤出 MISSING的信息
可以看到都是oozie这个空间丢失的块
/user/oozie/share/lib/lib_20201115122055/distcp/hadoop-distcp.jar 4038448 bytes, replicated: replication=3, 1 block(s): MISSING 1 blocks of total size 4038448 B
0. BP-157751563-10.31.1.123-1605413961809:blk_1073741831_1007 len=4038448 MISSING!
/user/oozie/share/lib/lib_20201115122055/distcp/netty-all-4.1.17.Final.jar 3780056 bytes, replicated: replication=3, 1 block(s): MISSING 1 blocks of total size 3780056 B
0. BP-157751563-10.31.1.123-1605413961809:blk_1073741829_1005 len=3780056 MISSING!
/user/oozie/share/lib/lib_20201115122055/distcp/oozie-sharelib-distcp-5.1.0-cdh6.3.1.jar 12759 bytes, replicated: replication=3, 1 block(s): MISSING 1 blocks of total size 12759 B
0. BP-157751563-10.31.1.123-1605413961809:blk_1073741826_1002 len=12759 MISSING!
/user/oozie/share/lib/lib_20201115122055/distcp/oozie-sharelib-distcp.jar 12759 bytes, replicated: replication=3, 1 block(s): MISSING 1 blocks of total size 12759 B
0. BP-157751563-10.31.1.123-1605413961809:blk_1073741830_1006 len=12759 MISSING!
/user/oozie/share/lib/lib_20201115122055/git/HikariCP-2.6.1.jar 133942 bytes, replicated: replication=3, 1 block(s): MISSING 1 blocks of total size 133942 B
0. BP-157751563-10.31.1.123-1605413961809:blk_1073741832_1008 len=133942 MISSING!
**中间省略部分输出**
0. BP-157751563-10.31.1.123-1605413961809:blk_1073743627_2803 len=18161 MISSING!
/user/oozie/share/lib/lib_20201115122055/sqoop/xz-1.6.jar 103131 bytes, replicated: replication=3, 1 block(s): MISSING 1 blocks of total size 103131 B
0. BP-157751563-10.31.1.123-1605413961809:blk_1073743633_2809 len=103131 MISSING!
/user/oozie/share/lib/lib_20201115122055/sqoop/zookeeper.jar 1543701 bytes, replicated: replication=3, 1 block(s): MISSING 1 blocks of total size 1543701 B
0. BP-157751563-10.31.1.123-1605413961809:blk_1073743638_2814 len=1543701 MISSING!
MISSING BLOCKS: 1814
MISSING SIZE: 1519384904 B
2.2 解决oozie副本块不足的问题
查看丢失的文件块信息
[root@hp1 ~]# hadoop dfs -ls /user/oozie/share/lib/lib_20201115122055/hive2/jcodings-1.0.18.jar
WARNING: Use of this script to execute dfs is deprecated.
WARNING: Attempting to execute replacement "hdfs dfs" instead.
-rwxrwxr-x 3 oo
导致这个问题产生时是由于初始化集群后,只有两个节点的datanode,所以将副本调整为2 、dfs.replication=2,在切换后对hdfs做了抑制,导致告警出现;现在重新加入一个datanode,重新恢复3个datanode节点3副本模式。
2.2.1 设置3个副本模式
sudo -u hdfs hadoop fs -setrep -R 3 /
2.2.2 删除坏的块:
-- 删除损坏的块
sudo -u hdfs hadoop fsck / -delete
-- 列出损坏的块
sudo -u hdfs hadoop fsck -list-corruptfileblocks
测试记录:
[root@hp1 ~]# sudo -u hdfs hadoop fsck -list-corruptfileblocks
WARNING: Use of this script to execute fsck is deprecated.
WARNING: Attempting to execute replacement "hdfs fsck" instead.
Connecting to namenode via http://hp1:9870/fsck?ugi=hdfs&listcorruptfileblocks=1&path=%2F
The filesystem under path '/' has 0 CORRUPT files
现在看丢失块的问题解决:
![](https://img.haomeiwen.com/i2638478/41a058456ead0cce.png)
网友评论