美文网首页
groupby 与 distinct 去重时的区别

groupby 与 distinct 去重时的区别

作者: scottzcw | 来源:发表于2018-08-23 11:36 被阅读337次

    sql1,select count(distinct sellno) from xxx;

    sql2,select count( sellno) from

    (select sellno from xxx

    group by sellno) t;

    sql1执行过程:

    Stage-Stage-1: Map: 396 Reduce: 1 Cumulative CPU: 7915.67 sec HDFS Read: 119072894175 HDFS Write: 10 SUCCESS

    Total MapReduce CPU Time Spent: 0 days 2 hours 11 minutes 55 seconds 670 msec

    sql2执行过程:

    Stage-Stage-1: Map: 396 Reduce: 457 Cumulative CPU: 10056.7 sec HDFS Read: 119074266583 HDFS Write: 53469 SUCCESS

    Stage-Stage-2: Map: 177 Reduce: 1 Cumulative CPU: 280.22 sec HDFS Read: 472596 HDFS Write: 10 SUCCESS

    Total MapReduce CPU Time Spent: 0 days 2 hours 52 minutes 16 seconds 920 msec

    总结,distinct会将所有的数据都shuffle到一个reducer里面,而groupby 将数据分布到多台机器上执行,效率更高

    相关文章

      网友评论

          本文标题:groupby 与 distinct 去重时的区别

          本文链接:https://www.haomeiwen.com/subject/fpsgmftx.html