美文网首页
Index Usage – 2

Index Usage – 2

作者: 2548d1d6a965 | 来源:发表于2015-11-16 17:08 被阅读12次

    I’ve been a little slow in the follow-up to my previous posting on possibly redundant indexes. Before going into the slightly more complex stuff, there’s another peripheral point (but a very important one) that’s worth raising about how clever the optimizer can be. Here’s some code for 11.2.0.4 to demonstrate the point:

    create table t1
    nologging
    as
    with generator as (
        select  --+ materialize
            rownum id 
        from dual 
        connect by 
            level <= 1e4
    )
    select
        rownum                  id,
        trunc(sysdate,'MM') + (rownum-1)/1440   date_time,
        rpad('x',100)               padding
    from
        generator   v1,
        generator   v2
    where
        rownum <= 1e5
    ;
     
     
    alter table t1 
    add (
            date_only
        generated always as (trunc(date_time)) virtual 
    )
    ;
     
    begin
        dbms_stats.gather_table_stats(
            ownname      => user,
            tabname      =>'T1',
            method_opt   => 'for all columns size 1'
        );
    end;
    /
    create index t1_i1 on t1(date_only) nologging;
    

    So, in a two-step process, I’ve got an indexed virtual column that holds the value of the date_time column truncated to just the date. Would you expect the optimizer to use the index to execute the following query efficiently:

    select
            max(id)
    from
            t1
    where
            date_time between sysdate-1 and sysdate
    ;
    

    Note that the query references the real date_time column not the virtual column date_only, and it’s not using the expression that defines the index – yet the plan reads as follows:

    -----------------------------------------------------------------------------------------------
    | Id  | Operation                             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
    -----------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                      |       |     1 |    21 |    86   (2)| 00:00:01 |
    |   1 |  SORT AGGREGATE                       |       |     1 |    21 |            |          |
    |*  2 |   FILTER                              |       |       |       |            |          |
    |*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1    |  1442 | 30282 |    86   (2)| 00:00:01 |
    |*  4 |     INDEX RANGE SCAN                  | T1_I1 |  4306 |       |    13   (0)| 00:00:01 |
    -----------------------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       2 - filter(SYSDATE@!>=SYSDATE@!-1)
       3 - filter("DATE_TIME"<=SYSDATE@! AND "DATE_TIME">=SYSDATE@!-1)
       4 - access("T1"."DATE_ONLY">=TRUNC(SYSDATE@!-1) AND
                  "T1"."DATE_ONLY"<=TRUNC(SYSDATE@!))
    

    It’s a little odd that even though the optimizer in the newer versions of Oracle treats many simple expressions on sysdate as constants it still checks (operation 2) that “sysdate >= sysdate – 1” but perhaps that’s just a case of a piece of generic code that isn’t worth the risk or effort of changing.
    The key point, of course, is that Oracle has managed to generate some extra predicates that allow it to use the “wrong” index to get a first approximation of the result set fairly efficiently, and then used the original predicate to reduce the approximation down to the correct result set.
    If you want a quick sanity check on the access predicates used for operation 4:

    • If date_time >= sysdate-1, then trunc(date_time) >= trunc(sysdate-1)
    • If date_time <= sysdate, then trunc(date_time) <= trunc(sysdate)

    This style of predicate manipulation also works numeric data types, but I think its greatest benefit (or convenience) is likely to come from date data types where the data has been created with a time component but there are frequent “date-only” queries. The days of creating two indexes as a workaround for handling generated code that wants to deal with both date_time and trunc(date_time)predicates should be numbered.
    Footnote:
    This enhancement probably appeared in 11.2.0.2, and I first saw it described in October 2013 in this blog note by Mohamed Houri; but 12c offers a delightful little enhancement – here’s what my table looks like in the 12c version of the code:

    SQL> desc t1
     Name                          Null?    Type
     ----------------------------- -------- --------------------
     ID                                     NUMBER
     DATE_TIME                              DATE
     PADDING                                VARCHAR2(100)
    

    Where’s the virtual column ? The 12c version of my code had a slightly different definition for it:

    alter table t1  
    add (
            date_only
            invisible
            generated always as (trunc(date_time)) virtual
    )
    ;
    

    The transformation still works even when the virtual column is invisible. So (subject to searching for anomalies, boundary conditions and bugs) it looks as if you can change the table definition, and get the benefits of two indexes for the price of one without the application realising that anything has changed.

    相关文章

      网友评论

          本文标题:Index Usage – 2

          本文链接:https://www.haomeiwen.com/subject/pcikhttx.html