美文网首页
断电导致HDFS 块损坏修复

断电导致HDFS 块损坏修复

作者: 吃货大米饭 | 来源:发表于2019-08-22 17:33 被阅读0次

    一、现象

    断电 导致HDFS服务不正常或者显示块损坏

    二、直接DN节点上删除文件一个block的三个副本(3副本)

    [hadoop@hadoop001 subdir0]$ rm -rf blk_1073741827 blk_1073741827_1003.meta
    [hadoop@hadoop002 subdir0]$ rm -rf blk_1073741827 blk_1073741827_1003.meta
    [hadoop@hadoop003 subdir0]$ rm -rf blk_1073741827 blk_1073741827_1003.meta
    

    直接重启HDFS,直接模拟损坏效果

    三、检查hdfs文件系统健康

    hdfs fsck /path

    [hadoop@hadoop001 ~]$ hdfs fsck /
    Connecting to namenode via http://hadoop002:50070/fsck?ugi=hadoop&path=%2F
    FSCK started by hadoop (auth:SIMPLE) from /192.168.174.121 for path / at Thu Aug 22 17:07:58 CST 2019
    .
    /blockrecover/genome-scores.csv: CORRUPT blockpool BP-1685056456-192.168.174.121-1566207286072 block blk_1073741827
    
    /blockrecover/genome-scores.csv: MISSING 1 blocks of total size 55108925 B.Status: CORRUPT
     Total size:    323544381 B
     Total dirs:    10
     Total files:   1
     Total symlinks:                0
     Total blocks (validated):      3 (avg. block size 107848127 B)
      ********************************
      UNDER MIN REPL'D BLOCKS:      1 (33.333332 %)
      dfs.namenode.replication.min: 1
      CORRUPT FILES:        1
      MISSING BLOCKS:       1
      MISSING SIZE:         55108925 B
      CORRUPT BLOCKS:       1
      ********************************
     Minimally replicated blocks:   2 (66.666664 %)
     Over-replicated blocks:        0 (0.0 %)
     Under-replicated blocks:       0 (0.0 %)
     Mis-replicated blocks:         0 (0.0 %)
     Default replication factor:    3
     Average block replication:     2.0
     Corrupt blocks:                1
     Missing replicas:              0 (0.0 %)
     Number of data-nodes:          3
     Number of racks:               1
    FSCK ended at Thu Aug 22 17:07:58 CST 2019 in 5 milliseconds
    
    
    The filesystem under path '/' is CORRUPT
    
    

    四、输出损坏的块及其所属的文件列表

    hdfs fsck /path -list-corruptfileblocks

    [hadoop@hadoop001 ~]$ hdfs fsck /blockrecover/genome-scores.csv -list-corruptfileblocks
    Connecting to namenode via http://hadoop002:50070/fsck?ugi=hadoop&listcorruptfileblocks=1&path=%2Fblockrecover%2Fgenome-scores.csv
    The list of corrupt files under path '/blockrecover/genome-scores.csv' are:
    blk_1073741827  /blockrecover/genome-scores.csv
    The filesystem under path '/blockrecover/genome-scores.csv' has 1 CORRUPT files
    

    五、定位文件的哪些块分布在哪些机器上面

    -files 文件分块信息,
    -blocks 在带-files参数后才显示block信息
    -locations 在带-blocks参数后才显示block块所在datanode的具体IP位置,
    -racks 在带-files参数后显示机架位置

    错误情况

    [hadoop@hadoop001 ~]$ hdfs fsck /blockrecover/genome-scores.csv -files -blocks -locations -racks
    Connecting to namenode via http://hadoop002:50070/fsck?ugi=hadoop&files=1&blocks=1&locations=1&racks=1&path=%2Fblockrecover%2Fgenome-scores.csv
    FSCK started by hadoop (auth:SIMPLE) from /192.168.174.121 for path /blockrecover/genome-scores.csv at Thu Aug 22 17:16:36 CST 2019
    /blockrecover/genome-scores.csv 323544381 bytes, 3 block(s): 
    /blockrecover/genome-scores.csv: CORRUPT blockpool BP-1685056456-192.168.174.121-1566207286072 block blk_1073741827
     MISSING 1 blocks of total size 55108925 B
    0. BP-1685056456-192.168.174.121-1566207286072:blk_1073741825_1001 len=134217728 Live_repl=3 [/default-rack/192.168.174.122:50010, /default-rack/192.168.174.123:50010, /default-rack/192.168.174.121:50010]
    1. BP-1685056456-192.168.174.121-1566207286072:blk_1073741826_1002 len=134217728 Live_repl=3 [/default-rack/192.168.174.122:50010, /default-rack/192.168.174.123:50010, /default-rack/192.168.174.121:50010]
    2. BP-1685056456-192.168.174.121-1566207286072:blk_1073741827_1003 len=55108925 MISSING!
    
    Status: CORRUPT
     Total size:    323544381 B
     Total dirs:    0
     Total files:   1
     Total symlinks:                0
     Total blocks (validated):      3 (avg. block size 107848127 B)
      ********************************
      UNDER MIN REPL'D BLOCKS:      1 (33.333332 %)
      dfs.namenode.replication.min: 1
      CORRUPT FILES:        1
      MISSING BLOCKS:       1
      MISSING SIZE:         55108925 B
      CORRUPT BLOCKS:       1
      ********************************
     Minimally replicated blocks:   2 (66.666664 %)
     Over-replicated blocks:        0 (0.0 %)
     Under-replicated blocks:       0 (0.0 %)
     Mis-replicated blocks:         0 (0.0 %)
     Default replication factor:    3
     Average block replication:     2.0
     Corrupt blocks:                1
     Missing replicas:              0 (0.0 %)
     Number of data-nodes:          3
     Number of racks:               1
    FSCK ended at Thu Aug 22 17:16:36 CST 2019 in 1 milliseconds
    
    
    The filesystem under path '/blockrecover/genome-scores.csv' is CORRUPT
    

    正常情况:

    [hadoop@hadoop001 data]$ hdfs fsck /blockrecover/genome-scores.csv -files -blocks -locations -racks
    Connecting to namenode via http://hadoop002:50070/fsck?ugi=hadoop&files=1&blocks=1&locations=1&racks=1&path=%2Fblockrecover%2Fgenome-scores.csv
    FSCK started by hadoop (auth:SIMPLE) from /192.168.174.121 for path /blockrecover/genome-scores.csv at Thu Aug 22 17:36:21 CST 2019
    /blockrecover/genome-scores.csv 323544381 bytes, 3 block(s):  OK
    0. BP-1685056456-192.168.174.121-1566207286072:blk_1073741828_1004 len=134217728 Live_repl=3 [/default-rack/192.168.174.121:50010, /default-rack/192.168.174.123:50010, /default-rack/192.168.174.122:50010]
    1. BP-1685056456-192.168.174.121-1566207286072:blk_1073741829_1005 len=134217728 Live_repl=3 [/default-rack/192.168.174.121:50010, /default-rack/192.168.174.122:50010, /default-rack/192.168.174.123:50010]
    2. BP-1685056456-192.168.174.121-1566207286072:blk_1073741830_1006 len=55108925 Live_repl=3 [/default-rack/192.168.174.121:50010, /default-rack/192.168.174.123:50010, /default-rack/192.168.174.122:50010]
    
    Status: HEALTHY
     Total size:    323544381 B
     Total dirs:    0
     Total files:   1
     Total symlinks:                0
     Total blocks (validated):      3 (avg. block size 107848127 B)
     Minimally replicated blocks:   3 (100.0 %)
     Over-replicated blocks:        0 (0.0 %)
     Under-replicated blocks:       0 (0.0 %)
     Mis-replicated blocks:         0 (0.0 %)
     Default replication factor:    3
     Average block replication:     3.0
     Corrupt blocks:                0
     Missing replicas:              0 (0.0 %)
     Number of data-nodes:          3
     Number of racks:               1
    FSCK ended at Thu Aug 22 17:36:21 CST 2019 in 1 milliseconds
    
    
    The filesystem under path '/blockrecover/genome-scores.csv' is HEALTHY
    

    六、选择删除损坏的块文件,然后业务系统数据重刷

    [hadoop@hadoop001 ~]$ hdfs fsck / -delete
    Connecting to namenode via http://hadoop002:50070/fsck?ugi=hadoop&delete=1&path=%2F
    FSCK started by hadoop (auth:SIMPLE) from /192.168.174.121 for path / at Thu Aug 22 17:32:00 CST 2019
    .
    /blockrecover/genome-scores.csv: CORRUPT blockpool BP-1685056456-192.168.174.121-1566207286072 block blk_1073741827
    
    /blockrecover/genome-scores.csv: MISSING 1 blocks of total size 55108925 B.Status: CORRUPT
     Total size:    323544381 B
     Total dirs:    10
     Total files:   1
     Total symlinks:                0
     Total blocks (validated):      3 (avg. block size 107848127 B)
      ********************************
      UNDER MIN REPL'D BLOCKS:      1 (33.333332 %)
      dfs.namenode.replication.min: 1
      CORRUPT FILES:        1
      MISSING BLOCKS:       1
      MISSING SIZE:         55108925 B
      CORRUPT BLOCKS:       1
      ********************************
     Minimally replicated blocks:   2 (66.666664 %)
     Over-replicated blocks:        0 (0.0 %)
     Under-replicated blocks:       0 (0.0 %)
     Mis-replicated blocks:         0 (0.0 %)
     Default replication factor:    3
     Average block replication:     2.0
     Corrupt blocks:                1
     Missing replicas:              0 (0.0 %)
     Number of data-nodes:          3
     Number of racks:               1
    FSCK ended at Thu Aug 22 17:32:00 CST 2019 in 15 milliseconds
    
    
    The filesystem under path '/' is CORRUPT
    

    log文件丢一丢丢 没有关系
    文件是业务数据 订单数据 丢了,需要报告重刷数据

    七、总结

    1.hdfs fsck / -delete 直接删除损坏的文件

    如果是hbase 无需删除这个表的所有文件,只需重刷所有数据,put 有的就update 没有的就insert

    2.-files -locations -blocks -racks 好文件显示 坏文件不显示

    相关文章

      网友评论

          本文标题:断电导致HDFS 块损坏修复

          本文链接:https://www.haomeiwen.com/subject/vitosctx.html