I’ve been a little slow in the follow-up to my previous posting on possibly redundant indexes. Before going into the slightly more complex stuff, there’s another peripheral point (but a very important one) that’s worth raising about how clever the optimizer can be. Here’s some code for 11.2.0.4 to demonstrate the point:
create table t1
nologging
as
with generator as (
select --+ materialize
rownum id
from dual
connect by
level <= 1e4
)
select
rownum id,
trunc(sysdate,'MM') + (rownum-1)/1440 date_time,
rpad('x',100) padding
from
generator v1,
generator v2
where
rownum <= 1e5
;
alter table t1
add (
date_only
generated always as (trunc(date_time)) virtual
)
;
begin
dbms_stats.gather_table_stats(
ownname => user,
tabname =>'T1',
method_opt => 'for all columns size 1'
);
end;
/
create index t1_i1 on t1(date_only) nologging;
So, in a two-step process, I’ve got an indexed virtual column that holds the value of the date_time column truncated to just the date. Would you expect the optimizer to use the index to execute the following query efficiently:
select
max(id)
from
t1
where
date_time between sysdate-1 and sysdate
;
Note that the query references the real date_time column not the virtual column date_only, and it’s not using the expression that defines the index – yet the plan reads as follows:
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 21 | 86 (2)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 21 | | |
|* 2 | FILTER | | | | | |
|* 3 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1442 | 30282 | 86 (2)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | T1_I1 | 4306 | | 13 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(SYSDATE@!>=SYSDATE@!-1)
3 - filter("DATE_TIME"<=SYSDATE@! AND "DATE_TIME">=SYSDATE@!-1)
4 - access("T1"."DATE_ONLY">=TRUNC(SYSDATE@!-1) AND
"T1"."DATE_ONLY"<=TRUNC(SYSDATE@!))
It’s a little odd that even though the optimizer in the newer versions of Oracle treats many simple expressions on sysdate as constants it still checks (operation 2) that “sysdate >= sysdate – 1” but perhaps that’s just a case of a piece of generic code that isn’t worth the risk or effort of changing.
The key point, of course, is that Oracle has managed to generate some extra predicates that allow it to use the “wrong” index to get a first approximation of the result set fairly efficiently, and then used the original predicate to reduce the approximation down to the correct result set.
If you want a quick sanity check on the access predicates used for operation 4:
- If date_time >= sysdate-1, then trunc(date_time) >= trunc(sysdate-1)
- If date_time <= sysdate, then trunc(date_time) <= trunc(sysdate)
This style of predicate manipulation also works numeric data types, but I think its greatest benefit (or convenience) is likely to come from date data types where the data has been created with a time component but there are frequent “date-only” queries. The days of creating two indexes as a workaround for handling generated code that wants to deal with both date_time and trunc(date_time)predicates should be numbered.
Footnote:
This enhancement probably appeared in 11.2.0.2, and I first saw it described in October 2013 in this blog note by Mohamed Houri; but 12c offers a delightful little enhancement – here’s what my table looks like in the 12c version of the code:
SQL> desc t1
Name Null? Type
----------------------------- -------- --------------------
ID NUMBER
DATE_TIME DATE
PADDING VARCHAR2(100)
Where’s the virtual column ? The 12c version of my code had a slightly different definition for it:
alter table t1
add (
date_only
invisible
generated always as (trunc(date_time)) virtual
)
;
The transformation still works even when the virtual column is invisible. So (subject to searching for anomalies, boundary conditions and bugs) it looks as if you can change the table definition, and get the benefits of two indexes for the price of one without the application realising that anything has changed.
网友评论