oracle学习笔记7: 高级分组

作者: 猫猫_tomluo | 来源:发表于2016-07-29 22:09 被阅读357次

    group by 用来在原始数据上创建聚合来将数据转化为有用的信息。

    基本的group by 列出个个部门的名称,员工总数

    select d.dname, count(empno) empcount
      from scott.dept d
      left outer join scott.emp e
        on d.deptno = e.deptno
     group by d.dname
     order by d.dname;
    
    

    select列表中的每一列必须包含在group by子句中。如果没有则会导致错误。如:

    
    SQL> select d.dname, d.loc, count(empno) empcount
      2    from scott.emp e
      3     join scott.dept d
      4      on d.deptno = e.deptno
      5   group by d.dname;
    select d.dname, d.loc, count(empno) empcount
                    *
    第 1 行出现错误:
    ORA-00979: 不是 GROUP BY 表达式
    
    

    尽管包含group by子句的select语句输出看上去是按顺序列出的,你不能期待group by每次都排好序地返回数据,如果输出结果必须按照一定的顺序排列,则必须使用order by子句。

    --没有排序的group by 
    select deptno,count(*)
    from emp
    group by deptno;
    
    --复杂的sql
    set serveroutput off;
    --复杂的sql
    select /* lst7-4 */
    distinct dname,
             decode(d.deptno,
                    1,
                    (select count(*) from emp where deptno = 1),
                    2,
                    (select count(*) from emp where deptno = 2),
                    3,
                    (select count(*) from emp where deptno = 3),
                    (select count(*) from emp where deptno not in (1, 2, 3))) dept_count
      from (select distinct deptno from emp) d
      join dept d2
        on d.deptno = d.deptno;
    
    @E:\bjc2016\study\pln lst7-4
    

    上面的写法,会使SQL语句更加复杂难以理解并且难以维护。group by子句极大的简化必须写的sql语句以外,还消除了数据库不必要的IO。

    set serveroutput off;
    --复杂的sql
    select /* lst7-5 */
    distinct dname,
             count(empno) empcount
      from  emp e
      join dept d
        on d.deptno = d.deptno
        group by d.dname
        order by d.dname;
    
    @E:\bjc2016\study\pln lst7-5
    

    group by 优点:

    • 使sql语句更具有可读性
    • 书写起来比使用很多相关子查询更简单
    • 减少了重复访问同一个数据块的次数,从而得到更好的性能。

    在分组之后,还在数据集上应用了having子句。另一方面,在获取数据行之后,进行分组之前,应用了where子句。having 子句中可以使用运算,函数及子查询。

    
    SQL> --having子句
    SQL> select /* lst7-6 */
      2   d.dname, trunc(e.hiredate, 'yyyy') hiredate, count(empno) empcount
      3    from emp e
      4    join dept d
      5      on e.deptno = e.deptno
      6   group by d.dname, trunc(e.hiredate, 'yyyy')
      7  having count(empno) <= 5 and trunc(e.hiredate, 'yyyy') between (select min(hiredate)
      8                                                                    from scott.emp) and (select max(hiredate)
      9                                                                                           from scott.emp)
     10   order by d.dname;
    
    未选定行
    
    SQL> @E:\bjc2016\study\pln lst7-6
    原值    8:        WHERE UPPER(SQL_TEXT) LIKE '%&1%'
    新值    8:        WHERE UPPER(SQL_TEXT) LIKE '%lst7-6%'
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    SQL_ID  0wcfknkztdxqt, child number 0
    -------------------------------------
    select /* lst7-6 */  d.dname, trunc(e.hiredate, 'yyyy') hiredate,
    count(empno) empcount   from emp e   join dept d     on e.deptno =
    e.deptno  group by d.dname, trunc(e.hiredate, 'yyyy') having
    count(empno) <= 5 and trunc(e.hiredate, 'yyyy') between (select
    min(hiredate)
            from scott.emp) and (select max(hiredate)
                                                                       from
    scott.emp)  order by d.dname
    
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 239717969
    
    ---------------------------------------------------------------------------
    | Id  | Operation              | Name | E-Rows |  OMem |  1Mem | Used-Mem |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT       |      |        |       |       |          |
    |*  1 |  FILTER                |      |        |       |       |          |
    |   2 |   SORT GROUP BY        |      |      1 |  2048 |  2048 | 2048  (0)|
    |   3 |    MERGE JOIN CARTESIAN|      |     40 |       |       |          |
    |   4 |     TABLE ACCESS FULL  | DEPT |      4 |       |       |          |
    |   5 |     BUFFER SORT        |      |     10 |  2048 |  2048 | 2048  (0)|
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    |   6 |      TABLE ACCESS FULL | EMP  |     10 |       |       |          |
    |   7 |   SORT AGGREGATE       |      |      1 |       |       |          |
    |   8 |    TABLE ACCESS FULL   | EMP  |     10 |       |       |          |
    |   9 |   SORT AGGREGATE       |      |      1 |       |       |          |
    |  10 |    TABLE ACCESS FULL   | EMP  |     10 |       |       |          |
    ---------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - filter((COUNT(*)<=5 AND TRUNC(INTERNAL_FUNCTION("E"."HIREDATE"),'
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
                  fmyyyy')>= AND TRUNC(INTERNAL_FUNCTION("E"."HIREDATE"),'fmyyyy')<=))
    
    
    Note
    -----
       - Warning: basic plan statistics not available. These are only collected when
    :
    
           * hint 'gather_plan_statistics' is used for the statement or
           * parameter 'statistics_level' is set to 'ALL', at session or system leve
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    l
    
    
    
    已选择41行。
    
    SQL>
    
    

    group by的cube扩展

    当与group by子句一起使用时,将会使得对每一行都要考虑包含在cube的参数中的所有可能的元素组合。这个运算将会生成比表中实际存在的行数更多的数据行。

    -- hr.emplyees表的cube运算
    select last_name, first_name
      from hr.employees
     group by first_name, last_name;
    
    set autotrace off;
    set autotrace on statistics;
    with emps as
     (select /* lst-7 */
       last_name, first_name
        from hr.employees
       group by cube(last_name, first_name))
    select rownum, last_name, first_name from emps;
    
    

    对于每一对last_name,first_name,cube将会按顺序为每个元素替换为null值。cube生成的数据行在Oracle文档中称为超级聚合行,可以在运算列中加入null值来识别。

    
    SQL> set autotrace off;
    SQL> --预测cube返回行数
    SQL> with counts as
      2   (select count(distinct first_name) first_name_count,
      3           count(distinct last_name) last_name_count,
      4           count(distinct(first_name || last_name)) full_name_count
      5      from hr.employees)
      6  select first_name_count,
      7         last_name_count,
      8         full_name_count,
      9         first_name_count + last_name_count + full_name_count + 1 total_count
     10    from counts;
    
    FIRST_NAME_COUNT LAST_NAME_COUNT FULL_NAME_COUNT TOTAL_COUNT
    ---------------- --------------- --------------- -----------
                  91             102             107         301
    

    下面用SQL语句模拟cube,可以看出cube为我们节省了许多力气。

    --用union all生成cube数据行
    with emps as (
     select last_name,first_name from hr.employees
    ),
    mycube as (
           select last_name,first_name from emps
           union all
           select last_name,null first_name from emps
           union all
           select null last_name,first_name from emps
           union all
           select null last_name,null first_name from emps
    )
    select /*+ gather_plan_statistics */ * 
    from mycube group by last_name,first_name;
    
    

    cube实际应用

    sales_history模式中包含1998~2001年的销售数据。
    下面的SQL展示2001年的所有销售数据。并想要查看各个产品种类的销售情况汇总,包含基于10年消费者年龄段,收入水平的聚合;按照收入水平而不考虑年龄的汇总;以及按年龄而不考虑收入水平的聚合。

    --销售数据的union all查询
    with tsales as
     (select /* lst7-10 */
       s.quantity_sold,
       s.amount_sold,
       to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
       to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
       nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
       p.prod_name,
       p.prod_desc,
       p.prod_category,
       (pf.unit_cost * s.quantity_sold) total_cost,
       s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
        from sh.sales s
        join sh.customers c
          on c.cust_id = s.cust_id
        join sh.products p
          on p.prod_id = s.prod_id
        join sh.times t
          on t.time_id = s.time_id
        join sh.costs pf
          on pf.channel_id = s.channel_id
         and pf.prod_id = s.prod_id
         and pf.promo_id = s.promo_id
         and pf.time_id = s.time_id
       where (t.fiscal_year = 2001)),
    gb as
     (select --Q1 - 所有分类通过收入和年龄范围
       'Q1' query_tag,
       prod_category,
       cust_income_level,
       age_range,
       sum(profit) profit
        from tsales
       group by prod_category, cust_income_level, age_range
      union all
      select --Q2 - 所有分类通过年龄范围
       'Q2' query_tag,
       prod_category,
       'ALL INCOME' cust_income_level,
       age_range,
       sum(profit) profit
        from tsales
       group by prod_category, 'ALL INCOME', age_range
      union all
      select --Q3 - 所有分类通过收入
       'Q3' query_tag,
       prod_category,
       cust_income_level,
       'ALL AGE' age_range,
       sum(profit) profit
        from tsales
       group by prod_category, cust_income_level, 'ALL AGE'
      union all
      select --Q4 - 所有分类
       'Q4' query_tag,
       prod_category,
       'ALL INCOME' cust_income_level,
       'ALL AGE' age_range,
       sum(profit) profit
        from tsales
       group by prod_category, 'ALL INCOME', 'ALL AGE'
      )
    select * from gb order by prod_category, profit;
    

    【语法】NVL (expr1, expr2)
    【功能】若expr1为NULL,返回expr2;expr1不为NULL,返回expr1。
    注意两者的类型要一致
    nvl(c.cust_income_level, 'A: below 30,000') cust_income_level
    若c.cust_income_level为null,则返回'A: below 30,000'
    mod(x,y)
    【功能】返回x除以y的余数
    【参数】x,y,数字型表达式
    【返回】数字
    【示例】
    select mod(23,8),mod(24,8) from dual;
    返回:7,0
    to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' || to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range
    上面是求年龄段,如果56,则求出的范围为50_60

    --用cube代替union all
    with tsales as
     (select /* lst7-11 */
       s.quantity_sold,
       s.amount_sold,
       to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
       to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
       nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
       p.prod_name,
       p.prod_desc,
       p.prod_category,
       (pf.unit_cost * s.quantity_sold) total_cost,
       s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
        from sh.sales s
        join sh.customers c
          on c.cust_id = s.cust_id
        join sh.products p
          on p.prod_id = s.prod_id
        join sh.times t
          on t.time_id = s.time_id
        join sh.costs pf
          on pf.channel_id = s.channel_id
         and pf.prod_id = s.prod_id
         and pf.promo_id = s.promo_id
         and pf.time_id = s.time_id
       where (t.fiscal_year = 2001))
    select 'Q' || decode(cust_income_level,
                         null,
                         decode(age_range, null, 4, 3),
                         decode(age_range, null, 2, 1)) query_tag,
           prod_category,
           cust_income_level,
           age_range,
           sum(profit) profit
      from tsales
     group by prod_category, cube(cust_income_level, age_range)
     order by prod_category, profit;
    
    

    decode(条件,值1,翻译值1,值2,翻译值2,...值n,翻译值n,缺省值)
    【功能】根据条件返回相应值
    【参数】c1, c2, ...,cn,字符型/数值型/日期型,必须类型相同或null
    注:值1……n 不能为条件表达式,这种情况只能用case when then end解决
    ·含义解释:  
      decode(条件,值1,翻译值1,值2,翻译值2,...值n,翻译值n,缺省值)  
      该函数的含义如下:  
      IF 条件=值1 THEN
      RETURN(翻译值1)
      ELSIF 条件=值2 THEN
      RETURN(翻译值2)
      ......
      ELSIF 条件=值n THEN
      RETURN(翻译值n)  
      ELSE
      RETURN(缺省值)
      END IF
      
    或:
      when case 条件=值1 THEN
      RETURN(翻译值1)
      ElseCase 条件=值2 THEN
      RETURN(翻译值2)
      ......
      ElseCase 条件=值n THEN
      RETURN(翻译值n)  
      ELSE

    RETURN(缺省值)
      END
    'Q' || decode(cust_income_level, null,decode(age_range, null, 4, 3),decode(age_range, null, 2, 1)) query_tag
    是返回查询分类标识cust_income_level为null返返回decode(age_range, null, 4, 3)否则返回decode(age_range, null, 2, 1)
    cust_income_level==null and age_range==null,query_tag=4
    cust_income_level==null and age_range!=null,query_tag=3
    cust_income_level!=null and age_range==null,query_tag=2
    cust_income_level!=null and age_range!=null,query_tag=1

    用grouping()函数排除空值

    上面的SQL有个问题,尽管总行数与之前使用union all运算符所得到的相一致,一些数据行中的cust_income_level和age_range具有空值,并且有一行的这两列都为空值。当cube的参数中包含生成列的所有可能组合时,每一列都有会产生n-1个空值,n为列表中的数目。在查询的例子中有两个例,因此对于每个唯一的age_range值都会在cust_income_level列上产生空值。对于age_range列来说也适用同样的规则。如果这两列中的数据在某些行上原本就有空值,这些空值就可能出问题。如何辨别数据中原有的空值和cube扩展所插入的值呢?在oracle 8i中引入了grouping()函数,可以用来识别这些超聚合行。被用来作为grouping()函数参数的表达式必须与出现在group by子句中的表达式相匹配。例如
    decode(grouping(age_range),1,'ALL AGE',age_range) age_range
    age_range检测age_range是否有一行由cube产生的空值,或者是否其在数据库中本身就是空值。如果当前行是由cube生成的超聚合行则返回值为1,对于其它所有情况返回值都为0。
    当与case()表达式或decode()函数组合时,超聚合行中的空值可以用一个报告中有用的值替换。这种情况下,decode()看上去是更好的选择,因为它更简便并且grouping()函数仅有两种可能的返回值。

    --grouping()函数
    --无grouping
    with tsales as
     (select /* lst7-11 */
       s.quantity_sold,
       s.amount_sold,
       to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
       to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
       nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
       p.prod_name,
       p.prod_desc,
       p.prod_category,
       (pf.unit_cost * s.quantity_sold) total_cost,
       s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
        from sh.sales s
        join sh.customers c
          on c.cust_id = s.cust_id
        join sh.products p
          on p.prod_id = s.prod_id
        join sh.times t
          on t.time_id = s.time_id
        join sh.costs pf
          on pf.channel_id = s.channel_id
         and pf.prod_id = s.prod_id
         and pf.promo_id = s.promo_id
         and pf.time_id = s.time_id
       where (t.fiscal_year = 2001))
    select 'Q' || decode(cust_income_level,
                         null,
                         decode(age_range, null, 4, 3),
                         decode(age_range, null, 2, 1)) query_tag,
           prod_category,
           cust_income_level,
           age_range,
           sum(profit) profit
      from tsales
     group by prod_category, cube(cust_income_level, age_range)
     order by prod_category, profit;
    
    --有grouping
    --case和decode都可以工作,我更喜欢用decode
    with tsales as
     (select /* lst7-12 */
       s.quantity_sold,
       s.amount_sold,
       to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
       to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
       nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
       p.prod_name,
       p.prod_desc,
       p.prod_category,
       (pf.unit_cost * s.quantity_sold) total_cost,
       s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
        from sh.sales s
        join sh.customers c
          on c.cust_id = s.cust_id
        join sh.products p
          on p.prod_id = s.prod_id
        join sh.times t
          on t.time_id = s.time_id
        join sh.costs pf
          on pf.channel_id = s.channel_id
         and pf.prod_id = s.prod_id
         and pf.promo_id = s.promo_id
         and pf.time_id = s.time_id
       where (t.fiscal_year = 2001))
    select 'Q' || decode(cust_income_level,
                         null,
                         decode(age_range, null, 4, 3),
                         decode(age_range, null, 2, 1)) query_tag,
           prod_category,
           case grouping(cust_income_level)
             when 1 then
              'ALL INCOME'
             else
              cust_income_level
           end cust_income_level,
           decode(grouping(age_range), 1, 'ALL AGE', age_range) age_range,
           sum(profit) profit
      from tsales
     group by prod_category, cube(cust_income_level, age_range)
     order by prod_category, profit;
    
    

    用grouping()扩展报告

    另一种使用grouping()的方法是放在having子句中,用来控制在输出中显示哪个层级的聚合。
    使用grouping()函数可以被浓缩为对cube扩展中的各行或所有行进行滚动小计。

    --在having子句中进行grouping()
    with tsales as
     (select /* lst7-13 */
       s.quantity_sold,
       s.amount_sold,
       to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
       to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
       nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
       p.prod_name,
       p.prod_desc,
       p.prod_category,
       (pf.unit_cost * s.quantity_sold) total_cost,
       s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
        from sh.sales s
        join sh.customers c
          on c.cust_id = s.cust_id
        join sh.products p
          on p.prod_id = s.prod_id
        join sh.times t
          on t.time_id = s.time_id
        join sh.costs pf
          on pf.channel_id = s.channel_id
         and pf.prod_id = s.prod_id
         and pf.promo_id = s.promo_id
         and pf.time_id = s.time_id
       where (t.fiscal_year = 2001))
    select 'Q' || decode(cust_income_level,
                         null,
                         decode(age_range, null, 4, 3),
                         decode(age_range, null, 2, 1)) query_tag,
           prod_category,
           case grouping(cust_income_level)
             when 1 then
              'ALL INCOME'
             else
              cust_income_level
           end cust_income_level,
           decode(grouping(age_range), 1, 'ALL AGE', age_range) age_range,
           sum(profit) profit
      from tsales
     group by prod_category, cube(cust_income_level, age_range)
     --having grouping(cust_income_level)=1
     --having grouping(age_range)=1
     having grouping(cust_income_level)=1 and grouping(age_range)=1
     order by prod_category, profit;
    
    

    上面sql的数据可以看到将grouping()应用到cust_income_level列对所有age_range值跨各个收入层次创建聚合。对age_range列进行这样的操作会得到类似的效果,对所有cust_income_level值进行聚合而不考虑age_range的值。将cube扩展中的所有列作为grouping()函数的参数将会导致聚合被浓缩为一行类似sum(profit)和group by prod_category所实现的功能。但是,使用cube扩展简单修改having子句就可以创建几份不同的报告。

    用grouping_id()扩展报告

    grouping_id()函数相对grouping()函数来说是相对较新的,在oracle 9i中引入,与grouping()函数在某种程度上是类似的。不同的是grouping()计算一个表达式并返回0或1,而grouping_id()计算一个表达式,确定其参数中的哪一行(如果有的话)用来生成超聚合行,然后创建一个位矢量,并将该值作为整形值返回。

    --group_id()位矢量
    with rowgen as (
         select 1 bit_1,0 bit_0
         from dual
    ), cubed as (
     select 
     grouping_id(bit_1,bit_0) gid,
     to_char(grouping(bit_1)) bv_1,
     to_char(grouping(bit_0)) bv_0,
     decode(grouping(bit_1),1,'GRP BIT 1') gb_1,
     decode(grouping(bit_0),1,'GRP BIT 0') gb_0
     from rowgen
     group by cube(bit_1,bit_0)
    )
    select gid,bv_1 || bv_0 bit_vector,
    gb_1,
    gb_0
    from cubed
    order by gid;
    
    group_id()位矢量运行结果

    我们己经知道如何使用grouping()通过having子句来控制输出,但考虑数据库效率时,单独的grouping_id()调用可以用来取代所有不同的having grouping()子句。grouping()函数的功能仅仅用来辨别数据行,因为它仅能返回0或1。由于grouping_id()函数返回一个基于位矢量的数值,它可以轻易被用来进行各种不同的比较而不用修改sql语句。
    为什么要关注不改变SQL语句就能改变比较呢?如上面基于销售历史的例子中,用户可能会被给出4个输出选项,任意一个或多个可能会被选中。用户的选择可以用来作为使用having grouping_id()函数的一个单独的sql语句,而不是基于having grouping()的不同组全的多个sql语句的输入,因此需要数据库解析sql语句的次数也比就较少。同时这也会使得需要执行的sql语句更少,使用更小的IO,以及更少的内存。
    正如使用cube来避免通过union all将多个sql语句结合起来一样,grouping_id()能够避免在应用中使用多个sql语句。

    --显示所有收入层次和年龄段的聚合
    variable N_ALL_DATA number 
    --显示所有年龄段的聚合
    variable N_AGE_RANGE number 
    --显示所有收入层次的聚合
    variable N_INCOME_LEVEL number 
    --只给出汇总
    variable N_SUMMAY number 
    begin
     :N_ALL_DATA      := 0;      -- 1 生效
     :N_AGE_RANGE     := 2;      -- 2 生效
     :N_INCOME_LEVEL  := 0;      -- 3 生效
     :N_SUMMAY        := 4;      -- 4 生效
    end;
    /
    with tsales as
     (select /* lst7-15 */
       s.quantity_sold,
       s.amount_sold,
       to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
       to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
       nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
       p.prod_name,
       p.prod_desc,
       p.prod_category,
       (pf.unit_cost * s.quantity_sold) total_cost,
       s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
        from sh.sales s
        join sh.customers c
          on c.cust_id = s.cust_id
        join sh.products p
          on p.prod_id = s.prod_id
        join sh.times t
          on t.time_id = s.time_id
        join sh.costs pf
          on pf.channel_id = s.channel_id
         and pf.prod_id = s.prod_id
         and pf.promo_id = s.promo_id
         and pf.time_id = s.time_id
       where (t.fiscal_year = 2001))
    select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,
    prod_category,
    decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,
    decode(grouping(age_range),1,'ALL AGE',age_range) age_range, 
    sum(profit) profit
    from tsales
    group by prod_category,cube(cust_income_level,age_range)
    having grouping_id(cust_income_level,age_range)+1 in(:N_ALL_DATA,:N_AGE_RANGE,:N_INCOME_LEVEL,:N_SUMMAY)
    order by prod_category,profit;
    

    使用grouping函数也可以实现同的结果,但需要在having子句中进行一些测试。示例销售历史数据查询在cube参数中只包含两列。在having子句中总共需要进行4次测试,因为grouping子句将会返回1或者0,每一列有两个可能的值。从而需要4次测试。如果3列,则需要8次,所需的测试次数将会是2的n次方,其中n为cube中参数列或表达式的个数。

    用grouping()代替grouping_id()的having子句的例子

    --显示所有收入层次和年龄段的聚合
    variable N_ALL_DATA number 
    --显示所有年龄段的聚合
    variable N_AGE_RANGE number 
    --显示所有收入层次的聚合
    variable N_INCOME_LEVEL number 
    --只给出汇总
    variable N_SUMMAY number 
    begin
     :N_ALL_DATA      := 0;      -- 1 生效
     :N_AGE_RANGE     := 2;      -- 2 生效
     :N_INCOME_LEVEL  := 0;      -- 3 生效
     :N_SUMMAY        := 4;      -- 4 生效
    end;
    /
    with tsales as
     (select /* lst7-16 */
       s.quantity_sold,
       s.amount_sold,
       to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
       to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
       nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
       p.prod_name,
       p.prod_desc,
       p.prod_category,
       (pf.unit_cost * s.quantity_sold) total_cost,
       s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
        from sh.sales s
        join sh.customers c
          on c.cust_id = s.cust_id
        join sh.products p
          on p.prod_id = s.prod_id
        join sh.times t
          on t.time_id = s.time_id
        join sh.costs pf
          on pf.channel_id = s.channel_id
         and pf.prod_id = s.prod_id
         and pf.promo_id = s.promo_id
         and pf.time_id = s.time_id
       where (t.fiscal_year = 2001))
    select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,
    prod_category,
    decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,
    decode(grouping(age_range),1,'ALL AGE',age_range) age_range, 
    sum(profit) profit
    from tsales
    group by prod_category,cube(cust_income_level,age_range)
    having 
     (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_ALL_DATA)
     or (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_AGE_RANGE)
     or (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_INCOME_LEVEL)
     or (bin_to_num(grouping(cust_income_level),grouping(age_range))+1 = :N_SUMMAY)
    order by prod_category,profit;
    

    1. 使用grouping可以判断该行是数据库中本来的行,还是有统计产生的行
    grouping值为0时说明这个值是数据库中本来的值,为1说明是统计的结果(也可以说该列为空时是1,不为空时是0)
    2. GROUPING_ID()函数可以接受一列或多列,返回GROUPING位向量的十进制值。GROUPING位向量的计算方法是将按照顺序对每一列调用GROUPING函数的结果组合起来,所以说01和10的值不一样的
    3. group_id的使用 当group by子句中重复使用一个列时,通过group_id来去除重复值

    grouping sets与rollup()

    group by的grouping sets()扩展在oracle 9i中初次登场,前面的例子中的整个group by...having子句可以用group by grouping sets()替换。

    
    with tsales as
     (select /* lst7-17 */
       s.quantity_sold,
       s.amount_sold,
       to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
       to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
       nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
       p.prod_name,
       p.prod_desc,
       p.prod_category,
       (pf.unit_cost * s.quantity_sold) total_cost,
       s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
        from sh.sales s
        join sh.customers c
          on c.cust_id = s.cust_id
        join sh.products p
          on p.prod_id = s.prod_id
        join sh.times t
          on t.time_id = s.time_id
        join sh.costs pf
          on pf.channel_id = s.channel_id
         and pf.prod_id = s.prod_id
         and pf.promo_id = s.promo_id
         and pf.time_id = s.time_id
       where (t.fiscal_year = 2001))
    select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,
    prod_category,
    decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,
    decode(grouping(age_range),1,'ALL AGE',age_range) age_range, 
    sum(profit) profit
    from tsales
    group by prod_category,grouping sets(
    rollup(prod_category), --产品分类小计
    (cust_income_level),--产品分类和收入层次
    (age_range), --产品分类和年龄范围
    (cust_income_level,age_range) --产品分类,年龄范围和收入层次
    )
    --having group_id() < 1
    order by prod_category,profit;
    

    group by cube having grouping_id()与group by grouping sets一个主要的区别是,前者能将变量设定为正确的值来简便修改输出,而后者的输出不能修改,除非修改或动态生成sql语句。修改sql语句意味着需要维护更多的代码并且占用更多的数据库资源。最好尽量避免使用动态生成sql语句,因为它会消耗的数据库资源更多,并且在出现问题时难以检修。
    某些时候grouping_sets()扩展会导致输出中出现重复。重复是由rollup(prod_category)产生的。可以通过去掉rollup()然后重新运行得到验证,重复的行将不复存在。但是,每种产品种类的总计也不存在了。解决的办法就是使用group_id()函数标记重复的行,并将其插入到having子句中。
    在上面的sql中将--having group_id() < 1 改成 having group_id() < 1
    这样,输出结果就如预期的那样不包含重复的行了。有趣的是如果将rollup(prod_category)改成null,去掉having子句,同时还能得到预期的输出。代码如下:

    
    with tsales as
     (select /* lst7-17-1 */
       s.quantity_sold,
       s.amount_sold,
       to_char(mod(c.cust_year_of_birth, 10) * 10) || '_' ||
       to_char((mod(c.cust_year_of_birth, 10) * 10) + 10) age_range,
       nvl(c.cust_income_level, 'A: below 30,000') cust_income_level,
       p.prod_name,
       p.prod_desc,
       p.prod_category,
       (pf.unit_cost * s.quantity_sold) total_cost,
       s.amount_sold - (pf.unit_cost * s.quantity_sold) profit
        from sh.sales s
        join sh.customers c
          on c.cust_id = s.cust_id
        join sh.products p
          on p.prod_id = s.prod_id
        join sh.times t
          on t.time_id = s.time_id
        join sh.costs pf
          on pf.channel_id = s.channel_id
         and pf.prod_id = s.prod_id
         and pf.promo_id = s.promo_id
         and pf.time_id = s.time_id
       where (t.fiscal_year = 2001))
    select 'Q' || to_char(grouping_id(cust_income_level,age_range)+1) query_tag,
    prod_category,
    decode(grouping(cust_income_level),1,'ALL INCOME',cust_income_level) cust_income_level,
    decode(grouping(age_range),1,'ALL AGE',age_range) age_range, 
    sum(profit) profit
    from tsales
    group by prod_category,grouping sets(
    null,
    --rollup(prod_category), --产品分类小计
    (cust_income_level),--产品分类和收入层次
    (age_range), --产品分类和年龄范围
    (cust_income_level,age_range) --产品分类,年龄范围和收入层次
    )
    --having group_id() < 1
    order by prod_category,profit
    

    group by的rollup()扩展也可以单独用来计算否则将会需要由union all结合起来的多个查询完成的小计。
    例如:创建显示器所有名字以Sul开头的消费者各自的购买总额报告,并且要求对每个消费者分别按年,产品分类进行小计,还要有所有消费的总计。这种类型的任务可以使用rollup()完成。

    --rollup()小计
    with mysales as (
        select c.cust_last_name || ',' || c.cust_first_name cust_name,
        p.prod_category,
        to_char(trunc(time_id,'YYYY'),'YYYY') sale_year,
        p.prod_name,
        s.amount_sold
        from sh.sales s
        join sh.products p on p.prod_id=s.prod_id
        join sh.customers c on c.cust_id=s.cust_id
        where c.cust_last_name like 'Sul%'
    )
    select 
        decode(grouping(m.cust_name),1,'GRAND TOTAL',m.cust_name) cust_name,
        decode(grouping(m.sale_year),1,'TOTAL BY YEAR',m.sale_year) sale_year,
        decode(grouping(m.prod_category),1,'TOTAL BY CATEGORY',m.prod_category) prod_category,
        sum(m.amount_sold) amount_sold
    from mysales m
    group by rollup(m.cust_name,m.prod_category,m.sale_year)
    order by grouping(m.cust_name), 1,2,3;
    

    注意decode()和grouping()函数再一次被用来表示小计行。使用grouping(m.cust_name)将总计显示在报告的最后。由于这个值>0的唯一情况就是当计算所有消费者总计时,这个总计值只会出现在报告的最后。

    group by的局限性

    • LOB列,嵌套表或数组不能用做group by表达式的一部分
    SQL> with lobtest as (
      2   select to_clob(d.dname) dname
      3   from scott.emp e
      4   join scott.dept d on d.deptno=e.deptno
      5  )
      6  select l.dname
      7  from lobtest l
      8  group by l.dname;
    group by l.dname
             *
    第 8 行出现错误:
    ORA-00932: 数据类型不一致: 应为 -, 但却获得 CLOB
    
    • 不允许使用标量子查询表达式
    
    SQL> select d.dname,count(empno) empcount
      2  from scott.emp e
      3  join scott.dept d on d.deptno=e.deptno
      4  group by (select dname from scott.dept d2 where d2.dname = d.dname )
      5  order by d.dname;
    group by (select dname from scott.dept d2 where d2.dname = d.dname )
              *
    第 4 行出现错误:
    ORA-22818: 这里不允许出现子查询表达式
    
    • 如果group by子句引用任何对象类型的列则查询不能并行化
    create type dept_location_type as object
    (
           street_address varchar2(40),
           postal_code varchar2(10),
           city varchar2(30),
           state_province varchar2(10),
           country_id char(2),
           order member function match (e dept_location_type) return integer
    );
    /
    create or replace type body  dept_location_type
    as order member function match (e dept_location_type) return integer
    is 
       begin
             if city <e.city then
                     return -1;
             elsif city > e.city then
                     return 1;
             else
                     return 0;        
             end if;
       end;
    end;
    /      
    
    create table deptobj
    as 
    select d.deptno,d.dname
    from scott.dept d;     
    
    alter table  deptobj add (dept_location dept_location_type);   
    
    select * from deptobj;
    
    update deptobj set dept_location=dept_location_type('1234 fenmenao st','453076','ShenZhen','GuangDong','GD') where deptno=1;
    update deptobj set dept_location=dept_location_type('345 Leshan st','123456','LeShan','SiCuan','SC') where deptno=2;
    update deptobj set dept_location=dept_location_type('345 ChongQing st','123456','ChongQing','ChongQing','CQ') where deptno=3;
    update deptobj set dept_location=dept_location_type('345 ChangChun st','123456','ChangChun','GuiYang','GY') where deptno=4;
    --对象列的并行group by
    select /*+ gather_plan_statictics parallel(e 2)*/
    d.dept_location,count(e.ename) ecount
    from scott.emp e,deptobj d
    where e.deptno=d.deptno
    group by d.dept_location
    order by d.dept_location;
    
    

    执行结果

    对象列的并行group by

    dept_location类型体中的成员函数匹配用来进行城市值的比较,然后使用group by将雇员按城市分组。最后一个列出的局限性在后期的版本是可以工作的。

    总结

    Oracle以group by子句扩展的形式为SQL开发者提供了一些极佳的工具,帮助我们不仅能够减少代码量,并且能提高数据库效率。大多数的特性也要与其它不同的功能进行组合。

    group by总结

    相关文章

      网友评论

        本文标题:oracle学习笔记7: 高级分组

        本文链接:https://www.haomeiwen.com/subject/gjjujttx.html