美文网首页学习hadoop
运行一个简单的wordcount

运行一个简单的wordcount

作者: 广西年轻人 | 来源:发表于2018-02-04 16:26 被阅读15次

    1.在hdfs文件系统中建立文件工作目录

    /user    
      /tuxianchao    --用户
        /mapreduce  --mapreduce应用
          /wordcount  --应用名称
            /input         --输入
            /output       --输出
    #创建目录
    hdfs dfs -mkdir -p /user/tuxianchao/mapreduce/wordcount/input
    

    2.创建测试数据上传到hdfs中

    touch /opt/wc.input
    vim /opt/wc.input
    
    #测试数据
    tuxianchao supergroup hello 
    hadoop hello 
    hdfs hadoop
    mapreduce hadoop
    hellowworld yarn
    tuxianchao apache
    
    #上传到hdfs中
    hdfs dfs -put /opt/wc.input  /user/tuxianchao/mapreduce/wordcount/input
    

    3.使用示例中wordcount

    yarn jar /opt/hadoop/hadoop-2.7.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.4.jar wordcount  /user/tuxianchao/mapreduce/wordcount/input/ /user/tuxianchao/mapreduce/wordcount/output
    
    

    output:

    [root@tuxianchao bin]# yarn jar /opt/hadoop/hadoop-2.7.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.4.jar wordcount  /user/tuxianchao/mapreduce/wordcount/input/ /user/tuxianchao/mapreduce/wordcount/output
    18/02/04 16:20:06 INFO client.RMProxy: Connecting to ResourceManager at /172.18.243.39:8032
    18/02/04 16:20:08 INFO input.FileInputFormat: Total input paths to process : 1
    18/02/04 16:20:08 INFO mapreduce.JobSubmitter: number of splits:1
    18/02/04 16:20:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1517665716891_0002
    18/02/04 16:20:08 INFO impl.YarnClientImpl: Submitted application application_1517665716891_0002
    18/02/04 16:20:08 INFO mapreduce.Job: The url to track the job: http://tuxianchao:8088/proxy/application_1517665716891_0002/
    18/02/04 16:20:08 INFO mapreduce.Job: Running job: job_1517665716891_0002
    18/02/04 16:20:19 INFO mapreduce.Job: Job job_1517665716891_0002 running in uber mode : false
    18/02/04 16:20:19 INFO mapreduce.Job:  map 0% reduce 0%
    18/02/04 16:20:25 INFO mapreduce.Job:  map 100% reduce 0%
    18/02/04 16:20:32 INFO mapreduce.Job:  map 100% reduce 100%
    18/02/04 16:20:32 INFO mapreduce.Job: Job job_1517665716891_0002 completed successfully
    18/02/04 16:20:32 INFO mapreduce.Job: Counters: 49
        File System Counters
            FILE: Number of bytes read=134
            FILE: Number of bytes written=283885
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=249
            HDFS: Number of bytes written=92
            HDFS: Number of read operations=6
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=2
        Job Counters 
            Launched map tasks=1
            Launched reduce tasks=1
            Data-local map tasks=1
            Total time spent by all maps in occupied slots (ms)=4142
            Total time spent by all reduces in occupied slots (ms)=4682
            Total time spent by all map tasks (ms)=4142
            Total time spent by all reduce tasks (ms)=4682
            Total vcore-milliseconds taken by all map tasks=4142
            Total vcore-milliseconds taken by all reduce tasks=4682
            Total megabyte-milliseconds taken by all map tasks=4241408
            Total megabyte-milliseconds taken by all reduce tasks=4794368
        Map-Reduce Framework
            Map input records=7
            Map output records=13
            Map output bytes=157
            Map output materialized bytes=134
            Input split bytes=141
            Combine input records=13
            Combine output records=9
            Reduce input groups=9
            Reduce shuffle bytes=134
            Reduce input records=9
            Reduce output records=9
            Spilled Records=18
            Shuffled Maps =1
            Failed Shuffles=0
            Merged Map outputs=1
            GC time elapsed (ms)=171
            CPU time spent (ms)=1310
            Physical memory (bytes) snapshot=309096448
            Virtual memory (bytes) snapshot=4165795840
            Total committed heap usage (bytes)=165810176
        Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
        File Input Format Counters 
            Bytes Read=108
        File Output Format Counters 
            Bytes Written=92
    
    
    捕获.PNG

    4.查看结果


    捕获.PNG
    [root@tuxianchao bin]# hdfs dfs -text /user/tuxianchao/mapreduce/wordcount/output/par*
    apache  1
    hadoop  3
    hdfs    1
    hello   2
    hellowworld 1
    mapreduce   1
    supergroup  1
    tuxianchao  2
    yarn    1
    [root@tuxianchao bin]# 
    [root@tuxianchao bin]# 
    
    

    相关文章

      网友评论

        本文标题:运行一个简单的wordcount

        本文链接:https://www.haomeiwen.com/subject/sbimzxtx.html