美文网首页
Hadoop Map/Reduce 实践

Hadoop Map/Reduce 实践

作者: Ms柠檬 | 来源:发表于2015-04-10 05:46 被阅读161次

    最近疯狂面试,看来每一步都是有必要总结一下的:

    最近被问了如何写Mapper 和Reducer: 计算平均值,这个是稍微有点复杂的案例而已

    给出四个column:
    merchant name, category, transaction in 2014, transaction dollar in 2014
    现在要求的是the average dollar transaction per category.

    如何设计Mapper 和 Reducer
    总之只是所有事情都分开两步走:
    Average Dollar transaction per category = 2014 年所有的交易金额 / 2014 年所有的交易笔数 而已

    所以这个肯定会是Reducer 的最后一步:
    key(货品的种类)——> 用SumCount(2014所有的交易金额)/SumCount(2014 所有的交易笔数)

    Mapper函数:
    key(货品的种类)——> [SumCount(2014所有的交易金额), sumCount(2014所有的交易笔数)]

    所以Mapper 和 Reducer 的函数实现: (并非真正代码)
    把数据都存在Hashmap 里面
    map(key, value) <- {
    String[] values= input.spilt()
    Hashmap map =new Hashmap

     String category; 
     double transaction.number;
     double transaction.value; 
    
      list=[transaction.number, transaction.value]
    
      map.put(category,list) 
    
      map.get(category).add(list) 
    

    }

    使用loop把数据的和求出来:
    Sumcount(hashmap) <-{

     for( category :map.keyset()){
            list transaction.number=map.get(transaction.number) 
            list transaction.value=map.get(transaction.value) 
    
           sum=0 
           for( double number; transaction.number){
           sum+=number; 
    
       }
    
    
           sum2=0 
           for(double value: transaction.value) {
             sum2 += value; 
    
      }
    
      Arraylist resultsum= new list(number, value) 
    
      //emits category as a key and a list as value 
      output(category, new list(number, value) 
    

    }
    }

    所以这里打印出来的应该是这样的:
    A<- ([15, 2222] )
    B<-([10,12999])
    C<-([25,1390])

    这里就是Mapper 要做的,reducer就很简单了:
    If instead of emitting the mean we emit the sum of the values and the number of values, we can overcome the problem. In the example we saw before, the first mapper will emit the pair (30.0, 2) and the second (9.0, 3); if we sum the values and divide it by the sum of the numbers, we obtain the right result.

    reducer:
    从Mapper 获得所有的SumCount 数据,放在Hashmap 里面

    然后生成iterator, 遍历所有的category 计算平均值

    for( category : map.keySet()) {
    double sum = map.get(category).getSum();
    double coount= map.get(category).getcount();

     //emit value 
      write(category, (sum/count)) 
    

    }

    参考:http://andreaiacono.blogspot.com/search?updated-max=2014-05-15T23:37:00%2B02:00&max-results=1&start=2&by-date=false

    相关文章

      网友评论

          本文标题:Hadoop Map/Reduce 实践

          本文链接:https://www.haomeiwen.com/subject/lcvzxttx.html