美文网首页
MapReduce -- Hello World

MapReduce -- Hello World

作者: 华阳_3bcf | 来源:发表于2018-10-15 16:46 被阅读0次

    通常我们在学习一门语言的时候,写的第一个程序就是Hello World。而在学习Hadoop时,我们要写的第一个程序就是词频统计WordCount程序,这是一个官方的例子。这个例子中涉及到了HDFS,MapReduce,YARN,如果它能正常跑完,也就证明了hadoop 的基础组件运行没有问题。

    这里面的核心组件是MapReduce,它由两个词组成,承担的任务是:

    • Map 任务分解
    • Reduce 结果汇总

    运行WordCount程序

    运行环境:HDP 2.6

    切换用户,避免路径和权限问题:

    $ sudo su - hdfs
    

    1. 生成测试文件 test.txt

    随便在里面写文字(最后统计这些文字出现次数)

    2. 将文件上传 hdfs

    $ hadoop fs -mkdir /input
    $ hadoop fs -put ./test.txt /input/test.txt
    

    3. 查找示例文件的路径

    $ find /usr/hdp -name "hadoop-mapreduce-example*"
    /usr/hdp/2.6.4.0-91/hadoop-mapreduce/hadoop-mapreduce-examples.jar
    /usr/hdp/2.6.4.0-91/hadoop-mapreduce/hadoop-mapreduce-examples-2.7.3.2.6.4.0-91.jar
    

    4. 运行hadoop实例

    $ hadoop jar /usr/hdp/2.6.4.0-91/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount /input /out
    18/10/15 07:46:18 INFO client.AHSProxy: Connecting to Application History server at my-vm-hdp-1/192.168.0.68:10200
    18/10/15 07:46:18 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
    18/10/15 07:46:18 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm2]
    18/10/15 07:46:18 INFO input.FileInputFormat: Total input paths to process : 1
    18/10/15 07:46:18 INFO mapreduce.JobSubmitter: number of splits:1
    18/10/15 07:46:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1539582562189_0004
    18/10/15 07:46:19 INFO impl.YarnClientImpl: Submitted application application_1539582562189_0004
    18/10/15 07:46:19 INFO mapreduce.Job: The url to track the job: [http://my-vm-hdp-2:8088/proxy/application_1539582562189_0004/](http://my-vm-hdp-2:8088/proxy/application_1539582562189_0004/)
    18/10/15 07:46:19 INFO mapreduce.Job: Running job: job_1539582562189_0004
    18/10/15 07:46:24 INFO mapreduce.Job: Job job_1539582562189_0004 running in uber mode : false
    18/10/15 07:46:24 INFO mapreduce.Job:  map 0% reduce 0%
    18/10/15 07:46:28 INFO mapreduce.Job:  map 100% reduce 0%
    18/10/15 07:46:33 INFO mapreduce.Job:  map 100% reduce 100%
    18/10/15 07:46:33 INFO mapreduce.Job: Job job_1539582562189_0004 completed successfully
    18/10/15 07:46:33 INFO mapreduce.Job: Counters: 49
        File System Counters
            FILE: Number of bytes read=2578
            FILE: Number of bytes written=310487
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=3636
            HDFS: Number of bytes written=1933
            HDFS: Number of read operations=6
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=2
        Job Counters
            Launched map tasks=1
            Launched reduce tasks=1
            Data-local map tasks=1
            Total time spent by all maps in occupied slots (ms)=10048
            Total time spent by all reduces in occupied slots (ms)=4134
            Total time spent by all map tasks (ms)=2512
            Total time spent by all reduce tasks (ms)=2067
            Total vcore-milliseconds taken by all map tasks=2512
            Total vcore-milliseconds taken by all reduce tasks=2067
            Total megabyte-milliseconds taken by all map tasks=5144576
            Total megabyte-milliseconds taken by all reduce tasks=2116608
        Map-Reduce Framework
            Map input records=59
            Map output records=383
            Map output bytes=4821
            Map output materialized bytes=2578
            Input split bytes=97
            Combine input records=383
            Combine output records=161
            Reduce input groups=161
            Reduce shuffle bytes=2578
            Reduce input records=161
            Reduce output records=161
            Spilled Records=322
            Shuffled Maps =1
            Failed Shuffles=0
            Merged Map outputs=1
            GC time elapsed (ms)=88
            CPU time spent (ms)=1850
            Physical memory (bytes) snapshot=1617342464
            Virtual memory (bytes) snapshot=6485979136
            Total committed heap usage (bytes)=1510998016
        Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
        File Input Format Counters
            Bytes Read=3539
        File Output Format Counters
            Bytes Written=1933
    

    另一个例子,计算圆周率pi值:

    $ hadoop jar /usr/hdp/2.6.4.0-91/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 5 5
    

    更多的参数参考这里

    5. 运行过程中,打开另一个窗口,yarn命令检查application

    $ yarn application -list
    18/10/15 07:54:13 INFO client.AHSProxy: Connecting to Application History server at my-vm-hdp-1/192.168.0.68:10200
    18/10/15 07:54:13 INFO client.RequestHedgingRMFailoverProxyProvider: Looking for the active RM in [rm1, rm2]...
    18/10/15 07:54:13 INFO client.RequestHedgingRMFailoverProxyProvider: Found active RM [rm2]
    Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1
                    Application-Id        Application-Name        Application-Type          User         Queue                 State           Final-State           Progress                           Tracking-URL
    application_1539582562189_0005              word count               MAPREDUCE          hdfs       default               RUNNING             UNDEFINED                50%    [http://my-vm-hdp-3:43645](http://my-vm-hdp-3:43645/)
    

    6. 运行结束后,查看生成的文件

    $ hadoop fs -ls /out
    Found 2 items
    -rw-r--r--   3 hdfs hadoop          0 2018-10-15 07:46 /out/_SUCCESS
    -rw-r--r--   3 hdfs hadoop       1933 2018-10-15 07:46 /out/part-r-00000
    
    $ hadoop fs -cat /out/part-r-00000
    

    相关文章

      网友评论

          本文标题:MapReduce -- Hello World

          本文链接:https://www.haomeiwen.com/subject/clofzftx.html