Hadoop Map/Reduce 实践

作者: Ms柠檬 | 来源:发表于2015-04-10 05:46 被阅读161次

hadoop学习笔记（五）MapReduce环境
Hadoop Map/Reduce 实践
第2章关于MapReduce 学习笔记
hadoop文件系统HDFS
Flink
Spark Shuffle
spark基础入门-名词概念
大数据综合笔记
hadoop入门（四）
Hadoop 学习笔记 1

最近疯狂面试，看来每一步都是有必要总结一下的：

最近被问了如何写Mapper 和Reducer: 计算平均值，这个是稍微有点复杂的案例而已

给出四个column:
merchant name, category, transaction in 2014, transaction dollar in 2014
现在要求的是the average dollar transaction per category.

如何设计Mapper 和 Reducer
总之只是所有事情都分开两步走：
Average Dollar transaction per category = 2014 年所有的交易金额 / 2014 年所有的交易笔数而已

所以这个肯定会是Reducer 的最后一步：
key(货品的种类）——> 用SumCount(2014所有的交易金额）/SumCount(2014 所有的交易笔数）

Mapper函数：
key(货品的种类）——> [SumCount(2014所有的交易金额), sumCount(2014所有的交易笔数）]

所以Mapper 和 Reducer 的函数实现： (并非真正代码）
把数据都存在Hashmap 里面
map(key, value) <- {
String[] values= input.spilt()
Hashmap map =new Hashmap

 String category; 
 double transaction.number;
 double transaction.value; 

  list=[transaction.number, transaction.value]

  map.put(category,list) 

  map.get(category).add(list)

}

使用loop把数据的和求出来：
Sumcount(hashmap) <-{

 for( category :map.keyset()){
        list transaction.number=map.get(transaction.number) 
        list transaction.value=map.get(transaction.value) 

       sum=0 
       for( double number; transaction.number){
       sum+=number; 

   }


       sum2=0 
       for(double value: transaction.value) {
         sum2 += value; 

  }

  Arraylist resultsum= new list(number, value) 

  //emits category as a key and a list as value 
  output(category, new list(number, value)

}
}

所以这里打印出来的应该是这样的：
A<- ([15, 2222] )
B<-([10,12999])
C<-([25,1390])

这里就是Mapper 要做的，reducer就很简单了：
If instead of emitting the mean we emit the sum of the values and the number of values, we can overcome the problem. In the example we saw before, the first mapper will emit the pair (30.0, 2) and the second (9.0, 3); if we sum the values and divide it by the sum of the numbers, we obtain the right result.

reducer:
从Mapper 获得所有的SumCount 数据，放在Hashmap 里面

然后生成iterator, 遍历所有的category 计算平均值

for( category : map.keySet()) {
double sum = map.get(category).getSum();
double coount= map.get(category).getcount();

 //emit value 
  write(category, (sum/count))

}

参考：http://andreaiacono.blogspot.com/search?updated-max=2014-05-15T23:37:00%2B02:00&max-results=1&start=2&by-date=false

网友评论

本文标题：Hadoop Map/Reduce 实践

本文链接：https://www.haomeiwen.com/subject/lcvzxttx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Hadoop Map/Reduce 实践

相关文章

hadoop学习笔记（五）MapReduce环境