来源:https://community.cloudera.com/t5/Community-Articles/Details-of-the-output-hdfs-dfsadmin-report/ta-p/245505
hdfs dfsadmin -report outputs a brief report on the overall HDFS filesystem. It’s a useful command to quickly view how much disk is available, how many DataNodes are running, corrupted blocks etc.
Note: This article explains the disk space calculations as seen by the HDFS.
Command: Run the command with sudo -u hdfs prefixed to ensure you don't get a permission denied error.
sudo -u hdfs hdfs dfsadmin -report
You will see an output similar to:
Configured Capacity: 270082531328 (251.53 GB)
Present Capacity: 190246318080 (177.18 GB)
DFS Remaining: 143504465920 (133.65 GB)
DFS Used: 46741852160 (43.53 GB)
DFS Used%: 24.57%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (4):
Name: 123.45.678.910:50010 (kharearpit4.local)
Hostname: kharearpit4.local
Rack: /rack4
Decommission Status : Normal
Configured Capacity: 20063055872 (18.69 GB)
DFS Used: 40960 (40 KB)
Non DFS Used: 5971144704 (5.56 GB)
DFS Remaining: 14091870208 (13.12 GB)
DFS Used%: 0.00%
DFS Remaining%: 70.24%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sun Apr 23 19:57:56 UTC 2017
Name: 123.45.678.909:50010 (kharearpit3.local)
Hostname: kharearpit3.local
Rack: /rack3
Decommission Status : Normal
Configured Capacity: 83339825152 (77.62 GB)
DFS Used: 15580618752 (14.51 GB)
Non DFS Used: 22774845440 (21.21 GB)
DFS Remaining: 44984360960 (41.89 GB)
DFS Used%: 18.70%
DFS Remaining%: 53.98%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sun Apr 23 19:57:58 UTC 2017
Name: 123.45.678.908:50010 (kharearpit1.local)
Hostname: kharearpit1.local
Rack: /rack1
Decommission Status : Normal
Configured Capacity: 83339825152 (77.62 GB)
DFS Used: 15580672000 (14.51 GB)
Non DFS Used: 31497687040 (29.33 GB)
DFS Remaining: 36261466112 (33.77 GB)
DFS Used%: 18.70%
DFS Remaining%: 43.51%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sun Apr 23 19:57:58 UTC 2017
Name: 123.45.678.907:50010 (kharearpit2.local)
Hostname: kharearpit2.local
Rack: /rack2
Decommission Status : Normal
Configured Capacity: 83339825152 (77.62 GB)
DFS Used: 15580520448 (14.51 GB)
Non DFS Used: 19592536064 (18.25 GB)
DFS Remaining: 48166768640 (44.86 GB)
DFS Used%: 18.70%
DFS Remaining%: 57.80%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sun Apr 23 19:57:58 UTC 2017
This article aims at explaining the concepts of Configured Capacity, Present Capacity, DFS Used, DFS Remaining, Non DFS Used, in HDFS. The diagram below clearly explains these output space parameters assuming HDFS as a single disk.
A detailed explanation of these parameters are as follows:
1. Configured Capacity
It is the total capacity available to HDFS for storage. It is calculated as follows:
Configured Capacity = Total Disk Space - Reserved Space
Reserved space is the space which is allocated for OS level operations. Reserved space can be configured using the parameter dfs.datanode.du.reserved which can be added/updated from hdfs-site.xml. Replication factor is irrelevant in the case of Configured Capacity.
2. Present Capacity
It is the total amount of storage space which is actually available for storing the files after allocating some space for metadata and open-blocks (Non DFS Used space). So, the difference of Configured Capacity and Present Capacity is used for storing file system metadata and other information. When DataNodes sends report to the NameNode, it also has a Present Capacity parameter which is sent to the NameNode for the NameNode to track it and aggregate it from all the DataNodes, which gets displayed when hdfs dfsadmin -report command is run. Thus, Present Capacity may vary and it depends on the usage of other Non-HDFS directories, however, Configured Capacity remains same until you add/remove volume/disks from the HDFS.
3. DFS Used
It is the storage space that has been used up by HDFS. In order to get the actual size of the files stored in HDFS, divide the 'DFS Used' by the replication factor. The replication factor can be found in the hdfs-site.xml config file configured under dfs.replication parameter. So if the DFS Used is 90 GB, and your replication factor is 3, the actual size of your files in HDFS will be 90/3 = 30 GB.
4. DFS Remaining
It is the amount of storage space still available to the HDFS to store more files. If you have 90 GB remaining storage space, that mean you can still store up to 90/3 = 30 GB of files without exceeding your Configured Capacity and assuming replication factor is 3. So after understanding DFS Used and DFS Remaining we can say that:
Present Capacity = DFS Used + DFS Remaining
5. Non DFS Used
Non DFS used is any data in the filesystem of the data node(s) that isn't in \dfs.datanode.data.dir. The term 'Non DFS Used' means that "How much of Configured Capacity is being occupied for Non DFS Use".
Non DFS Used = Configured Capacity - DFS Remaining - DFS Used
VALIDATING THE OUTPUT
Present Capacity = Sum of [ DFS Used + DFS Remaining ] for all the Data Nodes
In the output shared above after running the command, we have 4 DataNode
Present Capacity = [ 40KB + 13.12 GB ] + [ 14.51 GB + 41.89 GB ] + [ 14.51 GB + 33.77 GB ] + [ 14.51 GB + 44.86 GB ]
= 177.18 GB
This is what we got when we ran the command.
Configured Capacity = Sum of Configured Capacity for all the Data Nodes
= 18.69 GB + 77.62 GB + 77.62 GB + 77.62 GB
= 251.55 GB
Another way for checking the Configured Capacity is,
Configured Capacity = Present Capacity + Non DFS Used on all the Data Nodes
= 177.18 GB + [ 5.56 GB + 21.21 GB + 29.33 GB + 18.25 GB ]
= 251.53 GB
网友评论