美文网首页
Details of the output: hdfs dfsa

Details of the output: hdfs dfsa

作者: 你的努力时光不会辜负 | 来源:发表于2021-02-25 10:02 被阅读0次

    来源:https://community.cloudera.com/t5/Community-Articles/Details-of-the-output-hdfs-dfsadmin-report/ta-p/245505

    hdfs dfsadmin -report outputs a brief report on the overall HDFS filesystem. It’s a useful command to quickly view how much disk is available, how many DataNodes are running, corrupted blocks etc.

    Note: This article explains the disk space calculations as seen by the HDFS.

    Command: Run the command with sudo -u hdfs prefixed to ensure you don't get a permission denied error.

    sudo -u hdfs hdfs dfsadmin -report

    You will see an output similar to:

    Configured Capacity: 270082531328 (251.53 GB)

    Present Capacity: 190246318080 (177.18 GB)

    DFS Remaining: 143504465920 (133.65 GB)

    DFS Used: 46741852160 (43.53 GB)

    DFS Used%: 24.57%

    Under replicated blocks: 0

    Blocks with corrupt replicas: 0

    Missing blocks: 0

    Missing blocks (with replication factor 1): 0

    -------------------------------------------------

    Live datanodes (4):

    Name: 123.45.678.910:50010 (kharearpit4.local)

    Hostname: kharearpit4.local

    Rack: /rack4

    Decommission Status : Normal

    Configured Capacity: 20063055872 (18.69 GB)

    DFS Used: 40960 (40 KB)

    Non DFS Used: 5971144704 (5.56 GB)

    DFS Remaining: 14091870208 (13.12 GB)

    DFS Used%: 0.00%

    DFS Remaining%: 70.24%

    Configured Cache Capacity: 0 (0 B)

    Cache Used: 0 (0 B)

    Cache Remaining: 0 (0 B)

    Cache Used%: 100.00%

    Cache Remaining%: 0.00%

    Xceivers: 2

    Last contact: Sun Apr 23 19:57:56 UTC 2017

    Name: 123.45.678.909:50010 (kharearpit3.local)

    Hostname: kharearpit3.local

    Rack: /rack3

    Decommission Status : Normal

    Configured Capacity: 83339825152 (77.62 GB)

    DFS Used: 15580618752 (14.51 GB)

    Non DFS Used: 22774845440 (21.21 GB)

    DFS Remaining: 44984360960 (41.89 GB)

    DFS Used%: 18.70%

    DFS Remaining%: 53.98%

    Configured Cache Capacity: 0 (0 B)

    Cache Used: 0 (0 B)

    Cache Remaining: 0 (0 B)

    Cache Used%: 100.00%

    Cache Remaining%: 0.00%

    Xceivers: 2

    Last contact: Sun Apr 23 19:57:58 UTC 2017

    Name: 123.45.678.908:50010 (kharearpit1.local)

    Hostname: kharearpit1.local

    Rack: /rack1

    Decommission Status : Normal

    Configured Capacity: 83339825152 (77.62 GB)

    DFS Used: 15580672000 (14.51 GB)

    Non DFS Used: 31497687040 (29.33 GB)

    DFS Remaining: 36261466112 (33.77 GB)

    DFS Used%: 18.70%

    DFS Remaining%: 43.51%

    Configured Cache Capacity: 0 (0 B)

    Cache Used: 0 (0 B)

    Cache Remaining: 0 (0 B)

    Cache Used%: 100.00%

    Cache Remaining%: 0.00%

    Xceivers: 2

    Last contact: Sun Apr 23 19:57:58 UTC 2017

    Name: 123.45.678.907:50010 (kharearpit2.local)

    Hostname: kharearpit2.local

    Rack: /rack2

    Decommission Status : Normal

    Configured Capacity: 83339825152 (77.62 GB)

    DFS Used: 15580520448 (14.51 GB)

    Non DFS Used: 19592536064 (18.25 GB)

    DFS Remaining: 48166768640 (44.86 GB)

    DFS Used%: 18.70%

    DFS Remaining%: 57.80%

    Configured Cache Capacity: 0 (0 B)

    Cache Used: 0 (0 B)

    Cache Remaining: 0 (0 B)

    Cache Used%: 100.00%

    Cache Remaining%: 0.00%

    Xceivers: 2

    Last contact: Sun Apr 23 19:57:58 UTC 2017

    This article aims at explaining the concepts of Configured CapacityPresent CapacityDFS UsedDFS RemainingNon DFS Used, in HDFS. The diagram below clearly explains these output space parameters assuming HDFS as a single disk.

    A detailed explanation of these parameters are as follows:

    1. Configured Capacity

    It is the total capacity available to HDFS for storage. It is calculated as follows:

    Configured Capacity = Total Disk Space - Reserved Space

    Reserved space is the space which is allocated for OS level operations. Reserved space can be configured using the parameter dfs.datanode.du.reserved which can be added/updated from hdfs-site.xml. Replication factor is irrelevant in the case of Configured Capacity.

    2. Present Capacity

    It is the total amount of storage space which is actually available for storing the files after allocating some space for metadata and open-blocks (Non DFS Used space). So, the difference of Configured Capacity and Present Capacity is used for storing file system metadata and other information. When DataNodes sends report to the NameNode, it also has a Present Capacity parameter which is sent to the NameNode for the NameNode to track it and aggregate it from all the DataNodes, which gets displayed when hdfs dfsadmin -report command is run. Thus, Present Capacity may vary and it depends on the usage of other Non-HDFS directories, however, Configured Capacity remains same until you add/remove volume/disks from the HDFS.

    3. DFS Used

    It is the storage space that has been used up by HDFS. In order to get the actual size of the files stored in HDFS, divide the 'DFS Used' by the replication factor. The replication factor can be found in the hdfs-site.xml config file configured under dfs.replication parameter. So if the DFS Used is 90 GB, and your replication factor is 3, the actual size of your files in HDFS will be 90/3 = 30 GB.

    4. DFS Remaining

    It is the amount of storage space still available to the HDFS to store more files. If you have 90 GB remaining storage space, that mean you can still store up to 90/3 = 30 GB of files without exceeding your Configured Capacity and assuming replication factor is 3. So after understanding DFS Used and DFS Remaining we can say that:

    Present Capacity = DFS Used + DFS Remaining

    5. Non DFS Used

    Non DFS used is any data in the filesystem of the data node(s) that isn't in \dfs.datanode.data.dir. The term 'Non DFS Used' means that "How much of Configured Capacity is being occupied for Non DFS Use".

    Non DFS Used = Configured Capacity - DFS Remaining - DFS Used

    VALIDATING THE OUTPUT

    Present Capacity = Sum of [ DFS Used + DFS Remaining ] for all the Data Nodes

    In the output shared above after running the command, we have 4 DataNode

    Present Capacity = [ 40KB + 13.12 GB ] + [ 14.51 GB + 41.89 GB ] + [ 14.51 GB + 33.77 GB ] + [ 14.51 GB + 44.86 GB ]

    = 177.18 GB

    This is what we got when we ran the command.

    Configured Capacity = Sum of Configured Capacity for all the Data Nodes

    = 18.69 GB + 77.62 GB + 77.62 GB + 77.62 GB

    = 251.55 GB

    Another way for checking the Configured Capacity is,

    Configured Capacity = Present Capacity + Non DFS Used on all the Data Nodes

    = 177.18 GB + [ 5.56 GB + 21.21 GB + 29.33 GB + 18.25 GB ]

    = 251.53 GB

    相关文章

      网友评论

          本文标题:Details of the output: hdfs dfsa

          本文链接:https://www.haomeiwen.com/subject/cwusfltx.html