1.在hdfs文件系统中建立文件工作目录
/user
/tuxianchao --用户
/mapreduce --mapreduce应用
/wordcount --应用名称
/input --输入
/output --输出
#创建目录
hdfs dfs -mkdir -p /user/tuxianchao/mapreduce/wordcount/input
2.创建测试数据上传到hdfs中
touch /opt/wc.input
vim /opt/wc.input
#测试数据
tuxianchao supergroup hello
hadoop hello
hdfs hadoop
mapreduce hadoop
hellowworld yarn
tuxianchao apache
#上传到hdfs中
hdfs dfs -put /opt/wc.input /user/tuxianchao/mapreduce/wordcount/input
3.使用示例中wordcount
yarn jar /opt/hadoop/hadoop-2.7.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.4.jar wordcount /user/tuxianchao/mapreduce/wordcount/input/ /user/tuxianchao/mapreduce/wordcount/output
output:
[root@tuxianchao bin]# yarn jar /opt/hadoop/hadoop-2.7.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.4.jar wordcount /user/tuxianchao/mapreduce/wordcount/input/ /user/tuxianchao/mapreduce/wordcount/output
18/02/04 16:20:06 INFO client.RMProxy: Connecting to ResourceManager at /172.18.243.39:8032
18/02/04 16:20:08 INFO input.FileInputFormat: Total input paths to process : 1
18/02/04 16:20:08 INFO mapreduce.JobSubmitter: number of splits:1
18/02/04 16:20:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1517665716891_0002
18/02/04 16:20:08 INFO impl.YarnClientImpl: Submitted application application_1517665716891_0002
18/02/04 16:20:08 INFO mapreduce.Job: The url to track the job: http://tuxianchao:8088/proxy/application_1517665716891_0002/
18/02/04 16:20:08 INFO mapreduce.Job: Running job: job_1517665716891_0002
18/02/04 16:20:19 INFO mapreduce.Job: Job job_1517665716891_0002 running in uber mode : false
18/02/04 16:20:19 INFO mapreduce.Job: map 0% reduce 0%
18/02/04 16:20:25 INFO mapreduce.Job: map 100% reduce 0%
18/02/04 16:20:32 INFO mapreduce.Job: map 100% reduce 100%
18/02/04 16:20:32 INFO mapreduce.Job: Job job_1517665716891_0002 completed successfully
18/02/04 16:20:32 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=134
FILE: Number of bytes written=283885
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=249
HDFS: Number of bytes written=92
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4142
Total time spent by all reduces in occupied slots (ms)=4682
Total time spent by all map tasks (ms)=4142
Total time spent by all reduce tasks (ms)=4682
Total vcore-milliseconds taken by all map tasks=4142
Total vcore-milliseconds taken by all reduce tasks=4682
Total megabyte-milliseconds taken by all map tasks=4241408
Total megabyte-milliseconds taken by all reduce tasks=4794368
Map-Reduce Framework
Map input records=7
Map output records=13
Map output bytes=157
Map output materialized bytes=134
Input split bytes=141
Combine input records=13
Combine output records=9
Reduce input groups=9
Reduce shuffle bytes=134
Reduce input records=9
Reduce output records=9
Spilled Records=18
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=171
CPU time spent (ms)=1310
Physical memory (bytes) snapshot=309096448
Virtual memory (bytes) snapshot=4165795840
Total committed heap usage (bytes)=165810176
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=108
File Output Format Counters
Bytes Written=92
捕获.PNG
4.查看结果
捕获.PNG
[root@tuxianchao bin]# hdfs dfs -text /user/tuxianchao/mapreduce/wordcount/output/par*
apache 1
hadoop 3
hdfs 1
hello 2
hellowworld 1
mapreduce 1
supergroup 1
tuxianchao 2
yarn 1
[root@tuxianchao bin]#
[root@tuxianchao bin]#
网友评论