众所周知,hadoop的核心有hdfs,mapReduce,之前8次的分享都是在将hdfs,那么从这次开始来分享一下mapReduce
MapReduce就是java程序,这一句话一出来让我这个java程序员看到很容易接受,有木有???
没错,学习任何东西入门不是helloWorld吗?怎么变成wordCount了呢?其实这里的wordCount就是helloWorld,先看看wordCount的代码,如下
map类
package com.xmf.mr.wordCount;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.util.StringUtils;
import java.io.IOException;
import java.io.StringWriter;
/**
* Created by Administrator on 2018/4/16.
*/
public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
//每读一行数据就调用一次这个方法
//key这一行的起始偏移量
//value是这一行的文本内容
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//将这一行的内容转换为String
String line = value.toString();
//以空格切分
String[] words = StringUtils.split(line, ' ');
//遍历单词数组,输出k-V
for (String word :words){
context.write(new Text(word),new LongWritable(1));
}
}
}
Reduce类
package com.xmf.mr.wordCount;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
/**
* Created by Administrator on 2018/4/16.
*/
public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
//框架在mapper处理结束之后,将所有kv缓存起来,进行分组,然后传递一个组<key,values{}>,调用一次reduce
//<hello,{1,1,1,1,1}>
@Override
protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
long count =0;
for (LongWritable value : values){
count += value.get();
}
//输出这个单词的统计结果
context.write(key,new LongWritable(count));
}
}
启动类
package com.xmf.mr.wordCount;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
/**用来描述特定的作业
* 比如改作业使用那个类作为map,哪个作为reducer
* 还可以指定输入数据路径
* 还可以指定输出文件路径
* Created by Administrator on 2018/4/18.
*/
public class WCRunner {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
//System.setProperty("hadoop.home.dir", "D:\\hadoop-2.4.1\\hadoop-2.4.1");
Job job = Job.getInstance(conf);
//设置job所用的哪些类在哪里
job.setJarByClass(WCRunner.class);
job.setMapperClass(WCMapper.class);
job.setReducerClass(WCReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
//原始数据存放路径
FileInputFormat.setInputPaths(job,new Path("hdfs://my01:9000/wc/srcdata"));
//输出文件存放路径
FileOutputFormat.setOutputPath(job,new Path("hdfs://my01:9000/wc/output"));
//将job提交给集群
job.waitForCompletion(true);
}
}
这个是我写的一个WordCount,要在windows(本地)运行需要修改很多东西,我已经修改了,不懂得朋友可以评论,我会收到通知,及时给你解答,这里就不在赘述,那么先看看在linux环境中,用hadoop命令运行的这种方式,这种方式不利于调试,入门嘛,先不管调试了,我们的目的很明确,就是对MR有一个直观的认识.
记录一下intellij idea怎么打jar包
第一步:
image.png
第二步:
image.png
第三步:
image.png
=====华丽的分割线====
继续,将上面的代码打位jar包,发送到服务器上
image.png
数据准备
image.png
数据:
image.png
运行
hadoop jar wordCount.jar com.xmf.mr.wordCount.WCRunner
image.png
image.png
已经执行完了,看看执行结果
image.png从结果可以看出来已经统计出了word的数量
望指正,不吝赐教!
网友评论