美文网首页我爱编程
Hadoop 理解与运用(四)

Hadoop 理解与运用(四)

作者: Blieveinmyself | 来源:发表于2017-10-09 20:39 被阅读0次

    Java编写MapReduce程序

    1、java开发map_reduce程序
    2、配置系统环境变量HADOOP_HOME,指向hadoop安装目录(如
    果你不想招惹不必要的麻烦,不要在目录中包含空格或者中文字符)把HADOOP_HOME/bin加到PATH环境变量(非必要,只是为了方便)
    3、如果是在windows下开发,需要添加windows的库文件
    1)把盘中共享的bin目录覆盖HADOOP_HOME/bin
    2)如果还是不行,把其中的hadoop.dll复制到c:\windows\system32目录下,可能需要重启机器

    4、建立新项目,引入hadoop需要的jar文件
    代码WordMapper:

    1 import java.io.IOException;
    2 import org.apache.hadoop.io.IntWritable;
    3 import org.apache.hadoop.io.LongWritable;
    4 import org.apache.hadoop.io.Text;
    5 import org.apache.hadoop.mapreduce.Mapper;
    6 public class WordMapper extends Mapper<LongWritable,Text, 
      Text, IntWritable> {
    7 @Override
    8 protected void map(LongWritable key, Text value, 
      Mapper<LongWritable, Text, Text, IntWritable>.Context context)
    9 throws IOException, InterruptedException {
    10 String line = value.toString();
    11 String[] words = line.split(" ");
    12 for(String word : words) {
    13 context.write(new Text(word),new IntWritable(1));
    14 }
    15 }
    16 }
    

    6、代码WordReducer:

    1 import java.io.IOException;
    2 import org.apache.hadoop.io.IntWritable;
    3 import org.apache.hadoop.io.LongWritable;
    4 import org.apache.hadoop.io.Text;
    5 import org.apache.hadoop.mapreduce.Reducer;
    6 publicclassWordReducerextendsReducer<Text, IntWritable, Text, LongWritable> {
    7 @Override
    8 protected void reduce(Text key, Iterable<IntWritable> values,
    9 Reducer<Text, IntWritable, Text, LongWritable>.Context context)
      throws IOException, InterruptedException {
    10 long count = 0 ;
    11 for(IntWritable v : values) {
    12 count += v.get();
    13  }
    14  context.write(key,newLongWritable(count));
    15  }
    
    16  }
    

    7、代码Test:

    import org.apache.hadoop.conf.Configuration;
    
    import org.apache.hadoop.fs.Path;
    
    import org.apache.hadoop.io.IntWritable;
    
    import org.apache.hadoop.io.LongWritable;
    
    import org.apache.hadoop.io.Text;
    
    import org.apache.hadoop.mapreduce.Job;
    
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
      public class Test {
      public static void main(String[] args) throws Exception {
      Configuration conf = new Configuration();    
      Job job = Job.getInstance(conf);       
      job.setMapperClass(WordMapper.class);
      job.setReducerClass(WordReducer.class);
      job.setMapOutputKeyClass(Text.class);     
      job.setMapOutputValueClass(IntWritable.class);       
      job.setOutputKeyClass(Text.class);       
     job.setOutputValueClass(LongWritable.class);        
    FileInputFormat.setInputPaths(job,"c:/bigdata/hadoop/test/test.txt"); FileOutputFormat.setOutputPath(job,newPath("c:/bigdata/hadoop/test/out/"));        
    job.waitForCompletion(true);    
    }
    }
    

    8、把hdfs中的文件拉到本地来运行

    1 FileInputFormat.setInputPaths(job,"hdfs://master:9000/wcinput/");
    2 FileOutputFormat.setOutputPath(job,new Path(
      "hdfs://master:9000/wcoutput2/"));
    

    注意这里是把hdfs文件拉到本地来运行,如果观察输出的话会观察到jobID带有local字样同时这样的运行方式是不需要yarn的(自己停掉yarn服务做实验)
    9、在远程服务器执行

    1 conf.set("fs.defaultFS","hdfs://master:9000/");
    2 conf.set("mapreduce.job.jar","target/wc.jar");
    3 conf.set("mapreduce.framework.name","yarn");
    4 conf.set("yarn.resourcemanager.hostname","master");
    5 conf.set("mapreduce.app-submission.cross-platform","true");
    6 FileInputFormat.setInputPaths(job, "/wcinput/");
    7 FileOutputFormat.setOutputPath(job, new Path("/wcoutput3/"));
    

    如果遇到权限问题,配置执行时的虚拟机参DHADOOP_USER_NAME=root
    10、也可以将hadoop的四个配置文件拿下来放到src根目录下,就不需要进行手工配置了,默认到classpath目录寻找
    11、或者将配置文件放到别的地方,使用conf.addResource(.class.getClassLoader.getResourceAsStream)方式添加,不推荐使用绝对路径的方式
    12、建立maven-hadoop项目:

    1<projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi
      ="http://www.w3.org/2001/XMLSchema-
     instance)"xsi:schemalocation="http://maven.apache.org/POM/4.0.0 
     http://maven.apache.org/xsd/maven-4.0.0.xsd">
    2 <modelversion>4.0.0</modelversion>
    3 <groupid>mashibing.com</groupid>
    4 <artifactid>maven</artifactid>
    5 <version>0.0.1-SNAPSHOT</version>
    6 <name>wc</name>
    7 <description>hello mp</description>
    8 <properties>
    9 <project.build.sourceencoding>UTF-8</project.build.sourceencoding>10 <hadoop.version>2.7.3</hadoop.version>
    11 </properties>
    12 <dependencies>
    13 <dependency>
    14  <groupId>junit</groupId>
    15 <artifactId>junit</artifactId>
    16  <version>4.12</version>
    17 </dependency>
    18 <dependency>
    19 <groupId>org.apache.hadoop</groupId>
    20 <artifactId>hadoop-client</artifactId>
    21 <version>${hadoop.version}</version>
    22 </dependency>
    23 <dependency>
    24 <groupId>org.apache.hadoop</groupId>
    25 <artifactId>hadoop-common</artifactId>
    26 <version>${hadoop.version}</version>
    27 </dependency>
    28 <dependency>
    29 <groupId>org.apache.hadoop</groupId>
    30 <artifactId>hadoop-hdfs</artifactId>
    31 <version>${hadoop.version}</version>
    32 </dependency>
    33 </dependencies>
    34 </project>
    

    13、配置log4j.properties,放到src/main/resources目录下

    1 log4j.rootCategory=INFO, stdout
    2 log4j.appender.stdout=org.apache.log4j.ConsoleAppender  
    3 log4j.appender.stdout.layout=org.apache.log4j.PatternLayout  
    4 log4j.appender.stdout.layout.ConversionPattern=[QC] %p [%t] 
      %C.%M(%L) | %m%n
    

    相关文章

      网友评论

        本文标题:Hadoop 理解与运用(四)

        本文链接:https://www.haomeiwen.com/subject/tvnpyxtx.html