美文网首页
大数据集群HDFS IO 测试

大数据集群HDFS IO 测试

作者: 润土1030 | 来源:发表于2018-12-13 17:22 被阅读34次
在日常的开发过程中,很多时候需要测试HDFS集群的写入性能,根据测试结果判断集群的性能是否要增加节点。或者很多时候需要判断我写入1个T的文件所需要的时间。
hadoop 自身就带了这样一个工具,我用的是cdh版本的,命令如下

hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 100
这条命令是往HDFS中写入10个大小为100M的文件,参数可以根据你的需要来调整,这里因为只是测试的原因,不写入那么多文件

执行命令后结果如下,可看到统计信息
[hdfs@dlbdn3 ~]$ hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar  TestDFSIO -write -nrFiles 10 -fileSize 100
18/12/13 17:11:07 INFO fs.TestDFSIO: TestDFSIO.1.7
18/12/13 17:11:07 INFO fs.TestDFSIO: nrFiles = 10
18/12/13 17:11:07 INFO fs.TestDFSIO: nrBytes (MB) = 100.0
18/12/13 17:11:07 INFO fs.TestDFSIO: bufferSize = 1000000
18/12/13 17:11:07 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
18/12/13 17:11:08 INFO fs.TestDFSIO: creating control file: 104857600 bytes, 10 files
18/12/13 17:11:09 INFO fs.TestDFSIO: created control files for: 10 files
18/12/13 17:11:09 INFO client.RMProxy: Connecting to ResourceManager at dlbdn3/192.168.123.4:8032
18/12/13 17:11:10 INFO client.RMProxy: Connecting to ResourceManager at dlbdn3/192.168.123.4:8032
18/12/13 17:11:10 INFO mapred.FileInputFormat: Total input paths to process : 10
18/12/13 17:11:10 INFO mapreduce.JobSubmitter: number of splits:10
18/12/13 17:11:10 INFO Configuration.deprecation: dfs.https.address is deprecated. Instead, use dfs.namenode.https-address
18/12/13 17:11:10 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
18/12/13 17:11:10 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1543800485319_1066
18/12/13 17:11:11 INFO impl.YarnClientImpl: Submitted application application_1543800485319_1066
18/12/13 17:11:11 INFO mapreduce.Job: The url to track the job: http://dlbdn3:8088/proxy/application_1543800485319_1066/
18/12/13 17:11:11 INFO mapreduce.Job: Running job: job_1543800485319_1066
18/12/13 17:11:18 INFO mapreduce.Job: Job job_1543800485319_1066 running in uber mode : false
18/12/13 17:11:18 INFO mapreduce.Job:  map 0% reduce 0%
18/12/13 17:11:38 INFO mapreduce.Job:  map 67% reduce 0%
18/12/13 17:11:40 INFO mapreduce.Job:  map 73% reduce 0%
18/12/13 17:11:45 INFO mapreduce.Job:  map 83% reduce 0%
18/12/13 17:11:46 INFO mapreduce.Job:  map 87% reduce 0%
18/12/13 17:11:47 INFO mapreduce.Job:  map 90% reduce 0%
18/12/13 17:11:48 INFO mapreduce.Job:  map 100% reduce 0%
18/12/13 17:11:55 INFO mapreduce.Job:  map 100% reduce 100%
18/12/13 17:11:55 INFO mapreduce.Job: Job job_1543800485319_1066 completed successfully
18/12/13 17:11:55 INFO mapreduce.Job: Counters: 53
    File System Counters
        FILE: Number of bytes read=405
        FILE: Number of bytes written=1411227
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=2320
        HDFS: Number of bytes written=1048576079
        HDFS: Number of read operations=43
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=12
    Job Counters 
        Launched map tasks=10
        Launched reduce tasks=1
        Data-local map tasks=10
        Total time spent by all maps in occupied slots (ms)=244225
        Total time spent by all reduces in occupied slots (ms)=4729
        Total time spent by all map tasks (ms)=244225
        Total time spent by all reduce tasks (ms)=4729
        Total vcore-milliseconds taken by all map tasks=244225
        Total vcore-milliseconds taken by all reduce tasks=4729
        Total megabyte-milliseconds taken by all map tasks=250086400
        Total megabyte-milliseconds taken by all reduce tasks=4842496
    Map-Reduce Framework
        Map input records=10
        Map output records=50
        Map output bytes=752
        Map output materialized bytes=996
        Input split bytes=1200
        Combine input records=0
        Combine output records=0
        Reduce input groups=5
        Reduce shuffle bytes=996
        Reduce input records=50
        Reduce output records=5
        Spilled Records=100
        Shuffled Maps =10
        Failed Shuffles=0
        Merged Map outputs=10
        GC time elapsed (ms)=1236
        CPU time spent (ms)=77520
        Physical memory (bytes) snapshot=6496624640
        Virtual memory (bytes) snapshot=18602020864
        Total committed heap usage (bytes)=9065988096
        Peak Map Physical memory (bytes)=626966528
        Peak Map Virtual memory (bytes)=1701945344
        Peak Reduce Physical memory (bytes)=334086144
        Peak Reduce Virtual memory (bytes)=1710272512
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=1120
    File Output Format Counters 
        Bytes Written=79
18/12/13 17:11:55 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
18/12/13 17:11:55 INFO fs.TestDFSIO:            Date & time: Thu Dec 13 17:11:55 CST 2018
18/12/13 17:11:55 INFO fs.TestDFSIO:        Number of files: 10
18/12/13 17:11:55 INFO fs.TestDFSIO: Total MBytes processed: 1000.0
18/12/13 17:11:55 INFO fs.TestDFSIO:      Throughput mb/sec: 5.411636099942095
18/12/13 17:11:55 INFO fs.TestDFSIO: Average IO rate mb/sec: 5.548243522644043
18/12/13 17:11:55 INFO fs.TestDFSIO:  IO rate std deviation: 0.9476594426012499
18/12/13 17:11:55 INFO fs.TestDFSIO:     Test exec time sec: 45.924
18/12/13 17:11:55 INFO fs.TestDFSIO: 
[hdfs@dlbdn3 ~]$ 

hadoop还带了很多有用的工具,放到以后再讲。

相关文章

  • 大数据集群HDFS IO 测试

    在日常的开发过程中,很多时候需要测试HDFS集群的写入性能,根据测试结果判断集群的性能是否要增加节点。或者很多时候...

  • SparkSQL基本使用

    往Hadoop集群上上传测试数据,hdfs dfs -cat /person/employee.txtemploy...

  • hdfs文件迁移

    hadoop跨集群之间迁移HDFS数据 不同hadoop集群之间迁移hive数据 hadoop跨集群之间迁移hiv...

  • Sqoop的简单使用案例

    导入数据 在Sqoop中,“导入”概念指:从非大数据集群(RDBMS)向大数据集群(HDFS,HIVE,HBASE...

  • 导入数据

    4.1、导入数据在Sqoop中,“导入”概念指:从非大数据集群(RDBMS)向大数据集群(HDFS,HIVE,HB...

  • Sqoop简单使用案例

    一.导入数据 在Sqoop中,“导入”概念指:从非大数据集群(RDBMS)向大数据集群(HDFS,HIVE,HBA...

  • Hadoop集群测试——HDFS测试

  • 导出数据

    在Sqoop中,“导出”概念指:从大数据集群(HDFS,HIVE,HBASE)向非大数据集群(RDBMS)中传输数...

  • 一篇文章教你自建hadoop集群迁移到EMR

    自建集群要迁移到EMR集群,往往需要迁移已有数据。本文主要介绍hdfs数据和hive meta数据如何迁移。 前置...

  • HDFS balance

    Hadoop的HDFS集群非常容易出现机器与机器之间磁盘利用率不平衡的情况,比如集群中添加新的数据节点。当HDFS...

网友评论

      本文标题:大数据集群HDFS IO 测试

      本文链接:https://www.haomeiwen.com/subject/ruubhqtx.html