运行一个简单的wordcount

作者: 广西年轻人 | 来源:发表于2018-02-04 16:26 被阅读15次

运行一个简单的wordcount
Hadoop学习2
Hadoop学习-第二天（MapReduce原理及WordCou
3. Hadoop：MapReduce 编程及 shuffle
Hadoop 运行wordcount 程序
hadoop wordcount示例运行
MapReduce运行流程
优质文章
Spark 简单的wordcount
Hadoop集群环境搭建（三台）

1.在hdfs文件系统中建立文件工作目录

/user    
  /tuxianchao    --用户
    /mapreduce  --mapreduce应用
      /wordcount  --应用名称
        /input         --输入
        /output       --输出
#创建目录
hdfs dfs -mkdir -p /user/tuxianchao/mapreduce/wordcount/input

2.创建测试数据上传到hdfs中

touch /opt/wc.input
vim /opt/wc.input

#测试数据
tuxianchao supergroup hello 
hadoop hello 
hdfs hadoop
mapreduce hadoop
hellowworld yarn
tuxianchao apache

#上传到hdfs中
hdfs dfs -put /opt/wc.input  /user/tuxianchao/mapreduce/wordcount/input

3.使用示例中wordcount

yarn jar /opt/hadoop/hadoop-2.7.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.4.jar wordcount  /user/tuxianchao/mapreduce/wordcount/input/ /user/tuxianchao/mapreduce/wordcount/output

output：

[root@tuxianchao bin]# yarn jar /opt/hadoop/hadoop-2.7.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.4.jar wordcount  /user/tuxianchao/mapreduce/wordcount/input/ /user/tuxianchao/mapreduce/wordcount/output
18/02/04 16:20:06 INFO client.RMProxy: Connecting to ResourceManager at /172.18.243.39:8032
18/02/04 16:20:08 INFO input.FileInputFormat: Total input paths to process : 1
18/02/04 16:20:08 INFO mapreduce.JobSubmitter: number of splits:1
18/02/04 16:20:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1517665716891_0002
18/02/04 16:20:08 INFO impl.YarnClientImpl: Submitted application application_1517665716891_0002
18/02/04 16:20:08 INFO mapreduce.Job: The url to track the job: http://tuxianchao:8088/proxy/application_1517665716891_0002/
18/02/04 16:20:08 INFO mapreduce.Job: Running job: job_1517665716891_0002
18/02/04 16:20:19 INFO mapreduce.Job: Job job_1517665716891_0002 running in uber mode : false
18/02/04 16:20:19 INFO mapreduce.Job:  map 0% reduce 0%
18/02/04 16:20:25 INFO mapreduce.Job:  map 100% reduce 0%
18/02/04 16:20:32 INFO mapreduce.Job:  map 100% reduce 100%
18/02/04 16:20:32 INFO mapreduce.Job: Job job_1517665716891_0002 completed successfully
18/02/04 16:20:32 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=134
        FILE: Number of bytes written=283885
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=249
        HDFS: Number of bytes written=92
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=4142
        Total time spent by all reduces in occupied slots (ms)=4682
        Total time spent by all map tasks (ms)=4142
        Total time spent by all reduce tasks (ms)=4682
        Total vcore-milliseconds taken by all map tasks=4142
        Total vcore-milliseconds taken by all reduce tasks=4682
        Total megabyte-milliseconds taken by all map tasks=4241408
        Total megabyte-milliseconds taken by all reduce tasks=4794368
    Map-Reduce Framework
        Map input records=7
        Map output records=13
        Map output bytes=157
        Map output materialized bytes=134
        Input split bytes=141
        Combine input records=13
        Combine output records=9
        Reduce input groups=9
        Reduce shuffle bytes=134
        Reduce input records=9
        Reduce output records=9
        Spilled Records=18
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=171
        CPU time spent (ms)=1310
        Physical memory (bytes) snapshot=309096448
        Virtual memory (bytes) snapshot=4165795840
        Total committed heap usage (bytes)=165810176
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=108
    File Output Format Counters 
        Bytes Written=92

捕获.PNG

4.查看结果

捕获.PNG

[root@tuxianchao bin]# hdfs dfs -text /user/tuxianchao/mapreduce/wordcount/output/par*
apache  1
hadoop  3
hdfs    1
hello   2
hellowworld 1
mapreduce   1
supergroup  1
tuxianchao  2
yarn    1
[root@tuxianchao bin]# 
[root@tuxianchao bin]#