美文网首页
MapReduce的多路径输出

MapReduce的多路径输出

作者: yannhuang | 来源:发表于2017-11-24 11:15 被阅读48次

在实际项目开发的时候,经常会出现把同一个文件中的内容进行分类输出,以便于进行下一轮的文件输出进行区分,这样就涉及到了HadoopMR的多路径输出的问题。

HadoopMR中的多路径输出使用的类是:MultipleOutputs,其常用接口如下:

public void write(KEYOUT key, VALUEOUT value, String baseOutputPath) 

以上接口的前面两个参数跟Context里的write一样,第三个参数是需要输出的分类前缀,比如输出如下:

multipleOutput.write(key, value, "ONE") 
multipleOutput.write(key, value, "TWO") 

则输出如下:

ONE-r-00000
TWO-r-00000

该接口也支持建立子目录,用于区别每类输出,比如:

multipleOutput.write(key, value, "folder1/ONE") 
multipleOutput.write(key, value, "folder2/TWO") 

则输出如下:

folder1/ONE-r-00000
folder2/TWO-r-00000

该接口主要用于reduce输出,下面提供reduce例子:

import java.io.IOException;

import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;

import com.dataeye.mr.util.OutFieldsBaseModel;

public class MultiReducer extends Reducer<OutFieldsBaseModel, OutFieldsBaseModel, NullWritable, OutFieldsBaseModel> {

    private OutFieldsBaseModel mapValueObj = new OutFieldsBaseModel();
    
    private MultipleOutputs<NullWritable, OutFieldsBaseModel> multipleOutput;
    
    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        multipleOutput.close();
    }
    
    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
        multipleOutput = new MultipleOutputs<NullWritable, OutFieldsBaseModel>(context);
    }

    @Override
    protected void reduce(OutFieldsBaseModel key, Iterable<OutFieldsBaseModel> values, Context context) throws IOException, InterruptedException {
        
        String[] keyArray = key.getOutFields();
        String deviceId = keyArray[0];
        mapValueObj.setOutFields(keyArray);
        int code = deviceId.hashCode() % 2;
        if (code == 0){
            multipleOutput.write(NullWritable.get(), mapValueObj, "ZERO/ZERO");
        } else {
            multipleOutput.write(NullWritable.get(), mapValueObj, "ONE/ONE");
        }
        
    }
}

相关文章

网友评论

      本文标题:MapReduce的多路径输出

      本文链接:https://www.haomeiwen.com/subject/pkpxbxtx.html