一、现象
断电 导致HDFS服务不正常或者显示块损坏
二、直接DN节点上删除文件一个block的三个副本(3副本)
[hadoop@hadoop001 subdir0]$ rm -rf blk_1073741827 blk_1073741827_1003.meta
[hadoop@hadoop002 subdir0]$ rm -rf blk_1073741827 blk_1073741827_1003.meta
[hadoop@hadoop003 subdir0]$ rm -rf blk_1073741827 blk_1073741827_1003.meta
直接重启HDFS,直接模拟损坏效果
三、检查hdfs文件系统健康
hdfs fsck /path
[hadoop@hadoop001 ~]$ hdfs fsck /
Connecting to namenode via http://hadoop002:50070/fsck?ugi=hadoop&path=%2F
FSCK started by hadoop (auth:SIMPLE) from /192.168.174.121 for path / at Thu Aug 22 17:07:58 CST 2019
.
/blockrecover/genome-scores.csv: CORRUPT blockpool BP-1685056456-192.168.174.121-1566207286072 block blk_1073741827
/blockrecover/genome-scores.csv: MISSING 1 blocks of total size 55108925 B.Status: CORRUPT
Total size: 323544381 B
Total dirs: 10
Total files: 1
Total symlinks: 0
Total blocks (validated): 3 (avg. block size 107848127 B)
********************************
UNDER MIN REPL'D BLOCKS: 1 (33.333332 %)
dfs.namenode.replication.min: 1
CORRUPT FILES: 1
MISSING BLOCKS: 1
MISSING SIZE: 55108925 B
CORRUPT BLOCKS: 1
********************************
Minimally replicated blocks: 2 (66.666664 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.0
Corrupt blocks: 1
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Thu Aug 22 17:07:58 CST 2019 in 5 milliseconds
The filesystem under path '/' is CORRUPT
四、输出损坏的块及其所属的文件列表
hdfs fsck /path -list-corruptfileblocks
[hadoop@hadoop001 ~]$ hdfs fsck /blockrecover/genome-scores.csv -list-corruptfileblocks
Connecting to namenode via http://hadoop002:50070/fsck?ugi=hadoop&listcorruptfileblocks=1&path=%2Fblockrecover%2Fgenome-scores.csv
The list of corrupt files under path '/blockrecover/genome-scores.csv' are:
blk_1073741827 /blockrecover/genome-scores.csv
The filesystem under path '/blockrecover/genome-scores.csv' has 1 CORRUPT files
五、定位文件的哪些块分布在哪些机器上面
-files 文件分块信息,
-blocks 在带-files参数后才显示block信息
-locations 在带-blocks参数后才显示block块所在datanode的具体IP位置,
-racks 在带-files参数后显示机架位置
错误情况
[hadoop@hadoop001 ~]$ hdfs fsck /blockrecover/genome-scores.csv -files -blocks -locations -racks
Connecting to namenode via http://hadoop002:50070/fsck?ugi=hadoop&files=1&blocks=1&locations=1&racks=1&path=%2Fblockrecover%2Fgenome-scores.csv
FSCK started by hadoop (auth:SIMPLE) from /192.168.174.121 for path /blockrecover/genome-scores.csv at Thu Aug 22 17:16:36 CST 2019
/blockrecover/genome-scores.csv 323544381 bytes, 3 block(s):
/blockrecover/genome-scores.csv: CORRUPT blockpool BP-1685056456-192.168.174.121-1566207286072 block blk_1073741827
MISSING 1 blocks of total size 55108925 B
0. BP-1685056456-192.168.174.121-1566207286072:blk_1073741825_1001 len=134217728 Live_repl=3 [/default-rack/192.168.174.122:50010, /default-rack/192.168.174.123:50010, /default-rack/192.168.174.121:50010]
1. BP-1685056456-192.168.174.121-1566207286072:blk_1073741826_1002 len=134217728 Live_repl=3 [/default-rack/192.168.174.122:50010, /default-rack/192.168.174.123:50010, /default-rack/192.168.174.121:50010]
2. BP-1685056456-192.168.174.121-1566207286072:blk_1073741827_1003 len=55108925 MISSING!
Status: CORRUPT
Total size: 323544381 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 3 (avg. block size 107848127 B)
********************************
UNDER MIN REPL'D BLOCKS: 1 (33.333332 %)
dfs.namenode.replication.min: 1
CORRUPT FILES: 1
MISSING BLOCKS: 1
MISSING SIZE: 55108925 B
CORRUPT BLOCKS: 1
********************************
Minimally replicated blocks: 2 (66.666664 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.0
Corrupt blocks: 1
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Thu Aug 22 17:16:36 CST 2019 in 1 milliseconds
The filesystem under path '/blockrecover/genome-scores.csv' is CORRUPT
正常情况:
[hadoop@hadoop001 data]$ hdfs fsck /blockrecover/genome-scores.csv -files -blocks -locations -racks
Connecting to namenode via http://hadoop002:50070/fsck?ugi=hadoop&files=1&blocks=1&locations=1&racks=1&path=%2Fblockrecover%2Fgenome-scores.csv
FSCK started by hadoop (auth:SIMPLE) from /192.168.174.121 for path /blockrecover/genome-scores.csv at Thu Aug 22 17:36:21 CST 2019
/blockrecover/genome-scores.csv 323544381 bytes, 3 block(s): OK
0. BP-1685056456-192.168.174.121-1566207286072:blk_1073741828_1004 len=134217728 Live_repl=3 [/default-rack/192.168.174.121:50010, /default-rack/192.168.174.123:50010, /default-rack/192.168.174.122:50010]
1. BP-1685056456-192.168.174.121-1566207286072:blk_1073741829_1005 len=134217728 Live_repl=3 [/default-rack/192.168.174.121:50010, /default-rack/192.168.174.122:50010, /default-rack/192.168.174.123:50010]
2. BP-1685056456-192.168.174.121-1566207286072:blk_1073741830_1006 len=55108925 Live_repl=3 [/default-rack/192.168.174.121:50010, /default-rack/192.168.174.123:50010, /default-rack/192.168.174.122:50010]
Status: HEALTHY
Total size: 323544381 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 3 (avg. block size 107848127 B)
Minimally replicated blocks: 3 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Thu Aug 22 17:36:21 CST 2019 in 1 milliseconds
The filesystem under path '/blockrecover/genome-scores.csv' is HEALTHY
六、选择删除损坏的块文件,然后业务系统数据重刷
[hadoop@hadoop001 ~]$ hdfs fsck / -delete
Connecting to namenode via http://hadoop002:50070/fsck?ugi=hadoop&delete=1&path=%2F
FSCK started by hadoop (auth:SIMPLE) from /192.168.174.121 for path / at Thu Aug 22 17:32:00 CST 2019
.
/blockrecover/genome-scores.csv: CORRUPT blockpool BP-1685056456-192.168.174.121-1566207286072 block blk_1073741827
/blockrecover/genome-scores.csv: MISSING 1 blocks of total size 55108925 B.Status: CORRUPT
Total size: 323544381 B
Total dirs: 10
Total files: 1
Total symlinks: 0
Total blocks (validated): 3 (avg. block size 107848127 B)
********************************
UNDER MIN REPL'D BLOCKS: 1 (33.333332 %)
dfs.namenode.replication.min: 1
CORRUPT FILES: 1
MISSING BLOCKS: 1
MISSING SIZE: 55108925 B
CORRUPT BLOCKS: 1
********************************
Minimally replicated blocks: 2 (66.666664 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.0
Corrupt blocks: 1
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Thu Aug 22 17:32:00 CST 2019 in 15 milliseconds
The filesystem under path '/' is CORRUPT
log文件丢一丢丢 没有关系
文件是业务数据 订单数据 丢了,需要报告重刷数据
七、总结
1.hdfs fsck / -delete 直接删除损坏的文件
如果是hbase 无需删除这个表的所有文件,只需重刷所有数据,put 有的就update 没有的就insert
网友评论