一般情况下,开发MapReduce程序后,我们需要将MapReduce程序打包成JAR包,然后再上传到Hadoop集群通过命令行运行,这样非常的不方便。为了提高开发效率,非常需要搭建一个Hadoop本地开发环境,下面简单将一个步骤:
1.将集群上安装的Hadoop整个文件夹复制到本地
2.本地设置Hadoop环境变量,我的本地Hadoop目录是:D:\hadoop-2.6.0-cdh5.14.0,设置的变量如下所示:
#新建系统变量
HADOOP_HOME=D:\hadoop-2.6.0-cdh5.14.0
HADOOP_PREFIX=D:\hadoop-2.6.0-cdh5.14.0
HADOOP_BIN_PATH=%HADOOP_HOME%\bin
#在Path环境变量增加
%HADOOP_HOME%\bin
%HADOOP_HOME%\sbin
3.Windows部署Hadoop还需要winutils.exe和hadoop.dll,下载winutils.exe以及对应版本的hadoop.dll,将hadoop.dll复制到系统盘的:C:\Windows\System32目录下,同时将hadoop.dll和winutils.exe复制到本地Hadoop的bin目录下,
下面是hadoop-2.6.0的winutils.exe和hadoop.dll:
https://pan.baidu.com/s/1VDD8k-9RBl1E5mSZXJO37w
4.这时可能需要重启机器,我的是重启之后才生效。
这时,就可以在本地直接提交MapReduce到集群了,提交任务代码配置如下所示:
Configuration conf=new Configuration();
conf.addResource("hadoop/core-site.xml");
conf.addResource("hadoop/hdfs-site.xml");
conf.addResource("hadoop/mapred-site.xml");
conf.addResource("hadoop/yarn-site.xml");
conf.set("fs.defaultFS","hdfs://192.168.199.100:9000");
conf.set("mapreduce.framework.name","yarn");
conf.set("yarn.resourcemanager.address","192.168.199.100:8032");
conf.set("yarn.resourcemanager.scheduler.address","192.168.199.100:8030");
conf.set("yarn.resourcemanager.hostname","192.168.199.100");
conf.set("mapreduce.app-submission.cross-platform","true");
Job job=Job.getInstance(conf,"MRJob_1");
job.setJar("G:\\idea-workplace\\movie_hadoop.jar");
job.setJarByClass(MRJob_1.class);
job.setMapperClass(MRJob_1_Map.class);
job.setReducerClass(MRJob_1_Reduce.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
job.setPartitionerClass(UserIdPartition.class);
FileInputFormat.addInputPath(job,new Path(map.get("MR1_input")));
Path outputPath=new Path(map.get("MR1_output"));
FileOutputFormat.setOutputPath(job,outputPath);
int flag=job.waitForCompletion(true)?0:1;
上面只是一个示例,注意:提交前需要需要将MR工程导出为JAR,因为其无法自动打包,然后通过job的setJar方法设置JAR包的位置就可以了。
网友评论