美文网首页
Hivesql/Python求相同老师学生连续上多节课的数据

Hivesql/Python求相同老师学生连续上多节课的数据

作者: 当_下 | 来源:发表于2020-08-20 21:38 被阅读0次

    表结构:

    表结构

    方式一:

    HIVESQL:

    1.首先按照学生和上课时间排序(升序),并新增一列row值当作参考列

    SELECT uid,`datetime`,teacher,id,ROW_NUMBER( ) OVER(order by `uid`,`datetime`) as rank from class_table结果如下:

    2.利用lag(a,b,c)开窗函数,共三个参数,a为选中的滞后列,b为滞后的行数,c为空时默认值(可以不指定)

    暂定连续上4节课

    SELECT uid, `datetime`, teacher, `rank`

    , lag(`rank`, 3) OVER (PARTITION BY uid, teacher ORDER BY `datetime`) AS `rank2`

    FROM (

    SELECT uid, `datetime`, teacher, ROW_NUMBER() OVER (ORDER BY `uid`, `datetime`) AS rank

    FROM class_table

    ) tmp

    结果如下:

    3.通过判断列rank和rank2的关系即可找出相同老师学生连续上多节课的数据

    SELECT *

    FROM (

    SELECT uid, `datetime`, teacher, `rank`

    , lag(`rank`, 3) OVER (PARTITION BY uid, teacher ORDER BY `datetime`) AS `rank2`

    FROM (

    SELECT uid, `datetime`, teacher, ROW_NUMBER() OVER (ORDER BY `uid`, `datetime`) AS rank

    FROM class_table

    ) tmp

    ) tmp2

    WHERE `rank` = `rank2` + 3

    调节滞后量即可查询其他连续数量


    方式二:

    原理都是一样的,pandas需要额外加一列自增列

    data.sort_values(by=['uid','datetime'],inplace=True)

    data['row'] = range(len(data))

    data['rank'] = data.sort_values('datetime').groupby(['uid','teacher'])['row'].shift(3)

    data[data['row']==data['rank']+3]

    希望能帮助到大家


    以上

    相关文章

      网友评论

          本文标题:Hivesql/Python求相同老师学生连续上多节课的数据

          本文链接:https://www.haomeiwen.com/subject/hliejktx.html