美文网首页
group by前后 数据不一致

group by前后 数据不一致

作者: mysimplebook | 来源:发表于2021-05-26 10:38 被阅读0次

    相同的筛选条件,有了group by后数值不一样,如

    1、原始sql

    SELECT

    source,

    --platform,

    max(source_name) as source_name,

    uniqExactMerge(device_id_state) as uv,

    toStartOfMonth(toDate(date_str)) as date_str

    FROM

    mdata_flows_user_realtime_day_all table_alias

    WHERE

    bd_id != '6'

    AND shop_type = '1'

    AND type = 'prod'

    AND source IN ('1')

    AND bd_id IN ('5',

    '12')

    AND platform IN ('12',

    '20')

    AND (table_alias.date_str between '2021-05-01' and '2021-05-25'

    or table_alias.date_str between '2020-05-01' and '2020-05-25'

    or table_alias.date_str between '2021-04-01' and '2021-04-25')

    GROUP BY

    date_str,

    source

    --platform

    结果:

    source_name|source|uv |date_str |

    -----------|------|-------|----------|

    主站        |1    |1135552|2021-05-01

    uv=1135552

    2、再加一个group by字段

    SELECT

    new_id,

    --platform,

    uniqExactMerge(device_id_state) as uv,

    toStartOfMonth(toDate(date_str)) as date_str

    FROM

    mdata_flows_user_realtime_day_all table_alias

    WHERE

    bd_id != '6'

    AND source = '1'

    AND shop_type = '1'

    AND type = 'prod'

    AND bd_id IN ('5',

    '12')

    AND platform IN ('12',

    '20')

    AND (table_alias.date_str between '2021-05-01' and '2021-05-25'

    or table_alias.date_str between '2020-05-01' and '2020-05-25'

    or table_alias.date_str between '2021-04-01' and '2021-04-25')

    GROUP BY

    date_str,

    new_id

    --platform

    结果:

    new_id|uv |date_str |

    ------|-------|----------|

        0| 207473|2021-05-01|

        1| 961056|2021-05-01

    uv=207473+961056=1168529

    3、继续添加一个字段platform

    SELECT

    new_id,

    platform,

    uniqExactMerge(device_id_state) as uv,

    toStartOfMonth(toDate(date_str)) as date_str

    FROM

    mdata_flows_user_realtime_day_all table_alias

    WHERE

    bd_id != '6'

    AND source = '1'

    AND shop_type = '1'

    AND type = 'prod'

    AND bd_id IN ('5',

    '12')

    AND platform IN ('12',

    '20')

    AND (table_alias.date_str between '2021-05-01' and '2021-05-25'

    or table_alias.date_str between '2020-05-01' and '2020-05-25'

    or table_alias.date_str between '2021-04-01' and '2021-04-25')

    GROUP BY

    date_str,

    new_id,

    platform

    结果:

    new_id|platform|uv |date_str |

    ------|--------|------|----------|

        1|20      |193091|2021-05-01|

        1|12      |767966|2021-05-01|

        0|20      | 36139|2021-05-01|

        0|12      |171334|2021-05-01|

    uv=767966+171334+36139+193091=1168530

    可见,三个完全相同筛选条件的sql,只因为group by字段的不同数据有差异。

    具体原因就是去重的顺序影响的,也就是 先汇总再去重 和先group by 再去重的区别。

    后者往往会大于等于前者,不同group by分组中可能存在相同的去重字段。

    相关文章

      网友评论

          本文标题:group by前后 数据不一致

          本文链接:https://www.haomeiwen.com/subject/fjfksltx.html