Tool的入口

一个hadoop tool的基本实现

 public class MyApp extends Configured implements Tool {
       
         public int run(String[] args) throws Exception {
           // Configuration processed by ToolRunner
           Configuration conf = getConf();
           
           // Create a JobConf using the processed conf
           JobConf job = new JobConf(conf, MyApp.class);
           
           // Process custom command-line options
           Path in = new Path(args[1]);
           Path out = new Path(args[2]);
           
           // Specify various job-specific parameters     
           job.setJobName("my-app");
           job.setInputPath(in);
           job.setOutputPath(out);
           job.setMapperClass(MyMapper.class);
           job.setReducerClass(MyReducer.class);
  
           // Submit the job, then poll for progress until the job is complete
           JobClient.runJob(job);
           return 0;
         }
         
         public static void main(String[] args) throws Exception {
           // Let ToolRunner handle generic command-line options 
           int res = ToolRunner.run(new Configuration(), new MyApp(), args);
           
           System.exit(res);
         }
       }

其中

//这个是一个tool的第一个入口，main函数，通过ToolRunner运行。
public static void main(String[] args)throws Exception{
    int res = ToolRunner.run(Configuration conf, Tool tool, String[] args);
    System.exit(res);
}
//之后会运行tool的这个方法，tool会在这个run方法中完成功能逻辑，这个方法在org.apache.hadoop.util.Tool接口中
public int run(String[] args) throws Exception{
    //tool的主要功能实现
}

ToolRunner的主要功能

创建（如果传入的是null），设置当前tool的Configuration
处理命令行参数。

处理命令行参数

在tool的执行过程中，有两个地方可以读入命令行参数main中的args和run中的args。main函数中的args得到的是原始的明亮行参数，例如下面的args。

Usage: hadoop jar <jar> [mainClass] args...

通常我们会传入一些跟hadoop运行时有关的参数，这中参数和某个tool的业务逻辑没啥关系，这是一般会通过-D key=val的方式传入，例如

Usage: hadoop jar <jar> [mainClass] -D key=value... toolargs...

ToolRunner解析参数的作用是将这些参数提取并存入Configuration中，便于job提取，同时将剩余的toolargs传入run方法中。所以run方法得到的就是tool相关的args。

其中ToolRunner中的run方法实现如下(hadoop-common-2.4.0.jar)

  public static int run(Configuration conf, Tool tool, String[] args) 
    throws Exception{
    if(conf == null) {
      conf = new Configuration();
    }
    GenericOptionsParser parser = new GenericOptionsParser(conf, args);
    //set the configuration back, so that Tool can configure itself
    tool.setConf(conf);
    
    //get the args w/o generic hadoop args
    String[] toolArgs = parser.getRemainingArgs();
    return tool.run(toolArgs);
  }

在解析参数时，和tool相关的参数一定要写到最后，否则-D的参数不会被解析。