美文网首页
Hive 的几个练习题

Hive 的几个练习题

作者: 活英雄 | 来源:发表于2018-08-12 20:09 被阅读0次

    题目1

    需求:
    每个用户截止到每月为止的最大单月访问次数和累计到该月的总访问次数

    三个字段的意思:
    用户名,月份,访问次数

    数据:
    A,2015-01,5
    A,2015-01,15
    B,2015-01,5
    A,2015-01,8
    B,2015-01,25
    A,2015-01,5
    A,2015-02,4
    A,2015-02,6
    B,2015-02,10
    B,2015-02,5
    A,2015-03,16
    A,2015-03,22
    B,2015-03,23
    B,2015-03,10
    B,2015-03,11

    最后结果展示:

    用户 月份 最大访问次数 总访问次数 当月访问次数
    A 2015-01 33 33 33
    A 2015-02 33 43 10
    A 2015-03 38 81 38
    B 2015-01 30 30 30
    B 2015-02 30 45 15
    B 2015-03 44 89 44

    解题思路

    1. 以username分和month分组,统计出每月访问次数,得到如下结果
      CREATE TABLE t01_s1 AS
      SELECT table01.username, table01. MONTH, sum(table01.count) sum FROM myhive.table01 GROUP BY table01.username, table01. MONTH;

    2. 进行自连接,选出tl.month>tr.month的字段,并用username和month分组就可以得到结果
      SELECT tl.username, tl. MONTH, max(tr.sum) maxvisit, sum(tr.sum) sumvisit, max(tl.sum) currentmonth
      FROM t01_s1 tl JOIN t01_s1 tr ON tl.username = tr.username
      WHERE tl. MONTH >= tr. MONTH
      GROUP BY tl.username, tl. MONTH;

    进行自连接后的结果如下:

    tl.username     tl.month        tl.sum  tr.username     tr.month        tr.sum
    A       2015-01 33      A       2015-01 33
    A       2015-02 10      A       2015-01 33
    A       2015-03 38      A       2015-01 33
    A       2015-01 33      A       2015-02 10
    A       2015-02 10      A       2015-02 10
    A       2015-03 38      A       2015-02 10
    A       2015-01 33      A       2015-03 38
    A       2015-02 10      A       2015-03 38
    A       2015-03 38      A       2015-03 38
    B       2015-01 30      B       2015-01 30
    B       2015-02 15      B       2015-01 30
    B       2015-03 44      B       2015-01 30
    B       2015-01 30      B       2015-02 15
    B       2015-02 15      B       2015-02 15
    B       2015-03 44      B       2015-02 15
    B       2015-01 30      B       2015-03 44
    B       2015-02 15      B       2015-03 44
    B       2015-03 44      B       2015-03 44
    

    题目2

    // 建表语句:
    CREATE TABLE course (
    id int(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
    sid int(11) DEFAULT NULL,
    course varchar(255) DEFAULT NULL,
    score int(11) DEFAULT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

    // 插入数据
    // 字段解释:id, 学号, 课程, 成绩
    INSERT INTO course VALUES (1, 1, 'yuwen', 43);
    INSERT INTO course VALUES (2, 1, 'shuxue', 55);
    INSERT INTO course VALUES (3, 2, 'yuwen', 77);
    INSERT INTO course VALUES (4, 2, 'shuxue', 88);
    INSERT INTO course VALUES (5, 3, 'yuwen', 98);
    INSERT INTO course VALUES (6, 3, 'shuxue', 65);

    求:所有数学课程成绩 大于 语文课程成绩的学生的学号

    解答:

    1. 自连接的方式:
      select c1.* from course c1 join course c2 on c1.sid=c2.sid where c1.score>c2.score and c1.course='shuxue';

    2. 行列转换的方式
      select a.sid from (select sid,
      max(case when course='yuwen'then score else 0 end) yuwen,
      max(case when course='shuxue'then score else 0 end) shuxue
      from course group by sid having shuxue>yuwen) a;

    题目3

    数据:
    2014010114
    2014010216
    2014010317
    2014010410
    2014010506
    2012010609
    2012010732
    2012010812
    2012010919
    2012011023
    2001010116
    2001010212
    2001010310
    2001010411
    2001010529
    2013010619
    2013010722
    2013010812
    2013010929
    2013011023
    2008010105
    2008010216
    2008010337
    2008010414
    2008010516
    2007010619
    2007010712
    2007010812
    2007010999
    2007011023
    2010010114
    2010010216
    2010010317
    2010010410
    2010010506
    2015010649
    2015010722
    2015010812
    2015010999
    2015011023

    要求: 求出一年中出现最高温度的那一天
    输出以下数据:
    20010105 29
    20070109 99
    20080103 37
    20100103 17
    20120107 32
    20130109 29
    20140103 17
    20150109 99

    解题

    解法一:
    SELECT * FROM exercise3
    WHERE concat( substr(DATA, 1, 4), substr(DATA, 9, 2))
    IN ( SELECT concat( substr(DATA, 1, 4), max(substr(DATA, 9, 2))) FROM exercise3 GROUP BY substr(DATA, 1, 4));
    思路:通过年份分组求出最高温度的那一年和最高温度,把这些数据看成一个集合。再查出原始表中出现这些数据的那一行。

    解法二:
    select substring(b.line, 1, 8) as max_temp_date, a.max_temp
    from exercise3 b join
    (select substring(c.line, 1, 4) as year, max(substring(c.line, -2)) as max_temp
    from exercise3 c group by substring(c.line, 1, 4)) a
    on a.year = substring(b.line, 1, 4) and
    a.max_temp = substring(b.line, -2);
    思路:1. 求出以你那为分组,求出最最高温度和年份

    1. 用原始表和这个表进行连接,连接条件为年份相同且最高温度相同的条目

    题目4

    现有一份以下格式的数据:
    表示有id为1,2,3的学生选修了课程a,b,c,d,e,f中其中几门:
    数据:
    id course
    1,a
    1,b
    1,c
    1,e
    2,a
    2,c
    2,d
    2,f
    3,a
    3,b
    3,c
    3,e

    编写Hive的HQL语句来实现以下结果:
    表中的1表示选修,表中的0表示未选修
    id a b c d e f
    1 1 1 1 0 1 0
    2 1 0 1 1 0 1
    3 1 1 1 0 1 0

    解题要点行列转换。
    解法1:
    select id,
    sum(case course when "a" then 1 else 0 end) as a,
    sum(case course when "b" then 1 else 0 end) as b,
    sum(case course when "c" then 1 else 0 end) as c,
    sum(case course when "d" then 1 else 0 end) as d,
    sum(case course when "e" then 1 else 0 end) as e,
    sum(case course when "f" then 1 else 0 end) as f
    from id_course group by id;

    解法2:
    先构造以下表
    id id_courses courses
    1 ["a","b","c","e"] ["a","b","c","d","e","f"]
    2 ["a","c","d","f"] ["a","b","c","d","e","f"]
    3 ["a","b","c","e"] ["a","b","c","d","e","f"]

    1.左边:
    select d.id as id, collect_set(d.course) as id_courses from id_course d group by d.id;

    1. 右边:
      select sort_array(collect_set(course)) as tt from id_course;

    2. 左右连接得到需要的表:
      create id_courses table as
      select a.id, a.id_courses, b.tt
      from
      (select d.id as id, collect_set(d.course) as id_courses from id_course d group by d.id) a
      join
      (select sort_array(collect_set(c.course)) as tt from id_course c) b ;

    3. 查询出最终结果
      使用if判断
      select
      id,
      if(array_contains(a.id_courses, courses[0]),1,0) as a,
      if(array_contains(a.id_courses, courses[1]),1,0) as b,
      if(array_contains(a.id_courses, courses[2]),1,0) as c,
      if(array_contains(a.id_courses, courses[3]),1,0) as d,
      if(array_contains(a.id_courses, courses[4]),1,0) as e,
      if(array_contains(a.id_courses, courses[5]),1,0) as f
      from id_courses a;

    相关文章

      网友评论

          本文标题:Hive 的几个练习题

          本文链接:https://www.haomeiwen.com/subject/qrtwhftx.html