By Franck Pachot

.
I have written a lot about SQL Plan Directives that appeared in 12c. They were used by default and, because of some side effects at the time of 12cR1 with legacy applications that were parsing too much, they have been disabled by default in 12cR2. Today, there are probably not used enough because of their bad reputation from those times. But for datawarehouses, they should be the default in my opinion.

There is a behaviour that surprised me initially and I though it was a bug but, after 5 years, the verdict is: expected behaviour (Bug 20311655 : SQL PLAN DIRECTIVE INVALIDATED BY STATISTICS FEEDBACK). The name of the bug is my fault: I initially though that the statistics feedback had been wrongly interpreted as HAS_STATS. But actually, this behaviour has nothing to do with it: it was visible here only because the re-optimization had triggered a new hard parse, which has changed the state. But any other query on similar predicates would have done the same.

And this is what I’m showing here: when the misestimate cannot be solved by extended statistics, the learning path of SQL Plan Directive have to go through this HAS_STATS state where misestimate will occur again. I’m mentioning the fact that extended statistics can help or not, and this is anticipated by the optimizer. For this reason, I’ve run two sets of examples: one with a predicate where no column group can help, and one where extended statistics can be created.

SQL> show parameter optimizer_adaptive
NAME                              TYPE    VALUE 
--------------------------------- ------- ----- 
optimizer_adaptive_plans boolean TRUE 
optimizer_adaptive_reporting_only boolean FALSE 
optimizer_adaptive_statistics boolean TRUE 

Since 12.2 the adaptive statistics are disabled by default: SQL Plan Directives are created but not used. This is fine for OLTP databases that are upgraded from previous versions. However, for data warehouse, analytic, ad-hoc queries, reporting, enabling adaptive statistics may help a lot when the static statistics are not sufficient to optimize complex queries.

SQL> alter session set optimizer_adaptive_statistics=true;

Session altered.

I’m enabling adaptive statistics for my session.

SQL> exec for r in (select directive_id from dba_sql_plan_dir_objects where owner=user) loop begin dbms_spd.drop_sql_plan_directive(r.directive_id); exception when others then raise; end; end loop;

I’m removing all SQL Plan Directives in my lab to build a reproducible test case.

SQL> create table DEMO pctfree 99 as select mod(rownum,2) a,mod(rownum,2) b,mod(rownum,2) c,mod(rownum,2) d from dual connect by level <=1000;

Table DEMO created.

This is my test table. Build on purpose with a special distribution of data: all rows with 0 or 1 on all columns.

SQL> alter session set statistics_level=all;

Session altered.

I’m profiling down to execution plan operation in order to see all execution statistics

SPD learning path {E}:
USABLE(NEW)->SUPERSEDED(HAS_STATS)->USABLE(PERMANENT)

SQL> select count(*) c1 from demo where a+b+c+d=0;

    C1 
______ 
   500 

Here is a query where dynamic sampling can help to get better statistics on selectivity but where no static statistic can help even on column group (extended statistics on expression is not considered for SQL Plan Directives even in 21c)

SQL> select * from dbms_xplan.display_cursor(format=>'allstats last');

                                                                                PLAN_TABLE_OUTPUT 
_________________________________________________________________________________________________ 
SQL_ID  fjcbm5x4014mg, child number 0                                                             
-------------------------------------                                                             
select count(*) c1 from demo where a+b+c+d=0                                                      
                                                                                                  
Plan hash value: 2180342005                                                                       
                                                                                                  
----------------------------------------------------------------------------------------------    
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |    
----------------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.03 |     253 |    250 |    
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.03 |     253 |    250 |    
|*  2 |   TABLE ACCESS FULL| DEMO |      1 |     10 |    500 |00:00:00.03 |     253 |    250 |    
----------------------------------------------------------------------------------------------    
                                                                                                  
Predicate Information (identified by operation id):                                               
---------------------------------------------------                                               
                                                                                                  
   2 - filter("A"+"B"+"C"+"D"=0)      

As expected the estimation (10 rows) is far from the actual number of rows (500). This statement is flagged for re-optimisation with cardinality feedback but I’m interested by different SQL statements here.

SQL> exec dbms_spd.flush_sql_plan_directive;

PL/SQL procedure successfully completed.

SQL> select state, extract(notes,'/spd_note/internal_state/text()') internal_state, extract(notes,'/spd_note/spd_text/text()') spd_text from dba_sql_plan_directives where directive_id in (select directive_id from dba_sql_plan_dir_objects where owner=user) and type='DYNAMIC_SAMPLING';


    STATE    INTERNAL_STATE                      SPD_TEXT 
_________ _________________ _____________________________ 
USABLE    NEW               {E(DEMO.DEMO)[A, B, C, D]}    

A SQL Plan Directive has been created to keep the information that equality predicates on columns A, B, C and D are misestimated. The directive is in internal state NEW. The visible state is USABLE which means that dynamic sampling will be used by queries with a similar predicate on those columns.

SQL> select count(*) c2 from demo where a+b+c+d=0;

    C2 
______ 
   500 

SQL> select * from dbms_xplan.display_cursor(format=>'allstats last');

                                                                       PLAN_TABLE_OUTPUT 
________________________________________________________________________________________ 
SQL_ID  5sg7b9jg6rj2k, child number 0                                                    
-------------------------------------                                                    
select count(*) c2 from demo where a+b+c+d=0                                             
                                                                                         
Plan hash value: 2180342005                                                              
                                                                                         
-------------------------------------------------------------------------------------    
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |    
-------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     253 |    
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     253 |    
|*  2 |   TABLE ACCESS FULL| DEMO |      1 |    500 |    500 |00:00:00.01 |     253 |    
-------------------------------------------------------------------------------------    
                                                                                         
Predicate Information (identified by operation id):                                      
---------------------------------------------------                                      
                                                                                         
   2 - filter("A"+"B"+"C"+"D"=0)                                                         
                                                                                         
Note                                                                                     
-----                                                                                    
   - dynamic statistics used: dynamic sampling (level=AUTO)                              
   - 1 Sql Plan Directive used for this statement      

As expected, a different query (note that I changed the column alias C1 to C2 but anything can be different as long as there’s an equality predicate involving the same columns) has accurate estimations (E-Rows=A-Rows) because of dynamic sampling (dynamic statistics) thanks to the used SQL Plan Directive.

SQL> exec dbms_spd.flush_sql_plan_directive;

PL/SQL procedure successfully completed.

SQL> select state, extract(notes,'/spd_note/internal_state/text()') internal_state, extract(notes,'/spd_note/spd_text/text()') spd_text from dba_sql_plan_directives where directive_id in (select directive_id from dba_sql_plan_dir_objects where owner=user) and type='DYNAMIC_SAMPLING';

        STATE    INTERNAL_STATE                      SPD_TEXT 
_____________ _________________ _____________________________ 
SUPERSEDED    HAS_STATS         {E(DEMO.DEMO)[A, B, C, D]}    

This is the important part and initially, I thought it was a bug because SUPERSEDED means that the next query on similar columns will not do dynamic sampling anymore, and then will have bad estimations. HAS_STATS does not mean that we have correct testimations here but only that there is no additional static statistics that can help. Because the optimizer has detected an expression (“A”+”B”+”C”+”D”=0) and automatic statistics extensions do not consider expressions.

SQL> select count(*) c3 from demo where a+b+c+d=0;

    C3 
______ 
   500 


SQL> select * from dbms_xplan.display_cursor(format=>'allstats last');

                                                                       PLAN_TABLE_OUTPUT 
________________________________________________________________________________________ 
SQL_ID  62cf5zwt4rwgj, child number 0                                                    
-------------------------------------                                                    
select count(*) c3 from demo where a+b+c+d=0                                             
                                                                                         
Plan hash value: 2180342005                                                              
                                                                                         
-------------------------------------------------------------------------------------    
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |    
-------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     253 |    
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     253 |    
|*  2 |   TABLE ACCESS FULL| DEMO |      1 |     10 |    500 |00:00:00.01 |     253 |    
-------------------------------------------------------------------------------------    
                                                                                         
Predicate Information (identified by operation id):                                      
---------------------------------------------------                                      
                                                                                         
   2 - filter("A"+"B"+"C"+"D"=0)    

We are still in the learning phase and as you can see, even if we know that there is a misestimate (SPD has been created), adaptive statistic tries to avoid dynamic sampling: no SPD used mentioned in the notes, and back to the misestimate of E-Rows=10.

SQL> exec dbms_spd.flush_sql_plan_directive;

PL/SQL procedure successfully completed.

SQL> select state, extract(notes,'/spd_note/internal_state/text()') internal_state, extract(notes,'/spd_note/spd_text/text()') spd_text from dba_sql_plan_directives where directive_id in (select directive_id from dba_sql_plan_dir_objects where owner=user) and type='DYNAMIC_SAMPLING';

    STATE    INTERNAL_STATE                      SPD_TEXT 
_________ _________________ _____________________________ 
USABLE    PERMANENT         {E(DEMO.DEMO)[A, B, C, D]}    

The HAS_STATS and the misestimate was temporary. Now that the optimizer has validated that with all possible static statistics available (HAS_STATS) we still have a misestimate, and then has passed the SPD status to PERMANENT: end of the learning phase, we will permanently do dynamic sampling for this kind of query.

SQL> select count(*) c4 from demo where a+b+c+d=0;

    C4 
______ 
   500 


SQL> select * from dbms_xplan.display_cursor(format=>'allstats last');

                                                                       PLAN_TABLE_OUTPUT 
________________________________________________________________________________________ 
SQL_ID  65ufgd70n61nh, child number 0                                                    
-------------------------------------                                                    
select count(*) c4 from demo where a+b+c+d=0                                             
                                                                                         
Plan hash value: 2180342005                                                              
                                                                                         
-------------------------------------------------------------------------------------    
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |    
-------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     253 |    
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     253 |    
|*  2 |   TABLE ACCESS FULL| DEMO |      1 |    500 |    500 |00:00:00.01 |     253 |    
-------------------------------------------------------------------------------------    
                                                                                         
Predicate Information (identified by operation id):                                      
---------------------------------------------------                                      
                                                                                         
   2 - filter("A"+"B"+"C"+"D"=0)                                                         
                                                                                         
Note                                                                                     
-----                                                                                    
   - dynamic statistics used: dynamic sampling (level=AUTO)                              
   - 1 Sql Plan Directive used for this statement                                        
                                                            

Yes, it has an overhead at hard parse time, but that helps to get better estimations and then faster execution plans. The execution plan shows that dynamic sampling is done because id SPD usage.

SPD learning path {EC}:
USABLE(NEW)->USABLE(MISSING_STATS)->SUPERSEDED(HAS_STATS)

I’m now running a query where the misestimate can be avoided with additional statistics: column group statistics extension.

SQL> select count(*) c1 from demo where a=0 and b=0 and c=0 and d=0;

    C1 
______ 
   500 

SQL> select * from dbms_xplan.display_cursor(format=>'allstats last');

                                                                       PLAN_TABLE_OUTPUT 
________________________________________________________________________________________ 
SQL_ID  2x5j71630ua0z, child number 0                                                    
-------------------------------------                                                    
select count(*) c1 from demo where a=0 and b=0 and c=0 and d=0                           
                                                                                         
Plan hash value: 2180342005                                                              
                                                                                         
-------------------------------------------------------------------------------------    
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |    
-------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     253 |    
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     253 |    
|*  2 |   TABLE ACCESS FULL| DEMO |      1 |     63 |    500 |00:00:00.01 |     253 |    
-------------------------------------------------------------------------------------    
                                                                                         
Predicate Information (identified by operation id):                                      
---------------------------------------------------                                      
                                                                                         
   2 - filter(("A"=0 AND "B"=0 AND "C"=0 AND "D"=0))   

I have a misestimate here (E-Rows much lower than E-Rows) because the optimizer doesn’t know the correlation between A,B,C and D.

SQL> exec dbms_spd.flush_sql_plan_directive;

PL/SQL procedure successfully completed.

SQL> select state, extract(notes,'/spd_note/internal_state/text()') internal_state, extract(notes,'/spd_note/spd_text/text()') spd_text from dba_sql_plan_directives where directive_id in (select directive_id from dba_sql_plan_dir_objects where owner=user) and type='DYNAMIC_SAMPLING';


    STATE    INTERNAL_STATE                       SPD_TEXT 
_________ _________________ ______________________________ 
USABLE    PERMANENT         {E(DEMO.DEMO)[A, B, C, D]}     
USABLE    NEW               {EC(DEMO.DEMO)[A, B, C, D]}    

I have now a new SQL Plan Directive and the difference with the previous one is that the equality predicate (E) is a simple column equality on each column (EC). From that, the optimizer knows that statistics extension on column group may help.

SQL> select count(*) c2 from demo where a=0 and b=0 and c=0 and d=0;

    C2 
______ 
   500 

SQL> select * from dbms_xplan.display_cursor(format=>'allstats last');


                                                                       PLAN_TABLE_OUTPUT 
________________________________________________________________________________________ 
SQL_ID  5sg8p03mmx7ca, child number 0                                                    
-------------------------------------                                                    
select count(*) c2 from demo where a=0 and b=0 and c=0 and d=0                           
                                                                                         
Plan hash value: 2180342005                                                              
                                                                                         
-------------------------------------------------------------------------------------    
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |    
-------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     253 |    
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     253 |    
|*  2 |   TABLE ACCESS FULL| DEMO |      1 |    500 |    500 |00:00:00.01 |     253 |    
-------------------------------------------------------------------------------------    
                                                                                         
Predicate Information (identified by operation id):                                      
---------------------------------------------------                                      
                                                                                         
   2 - filter(("A"=0 AND "B"=0 AND "C"=0 AND "D"=0))                                     
                                                                                         
Note                                                                                     
-----                                                                                    
   - dynamic statistics used: dynamic sampling (level=AUTO)                              
   - 1 Sql Plan Directive used for this statement       

So, the NEW directive is a USABLE state: SPD is used to do some dynamic sampling, as it was with the previous example.

SQL> exec dbms_spd.flush_sql_plan_directive;

PL/SQL procedure successfully completed.

SQL> select state, extract(notes,'/spd_note/internal_state/text()') internal_state, extract(notes,'/spd_note/spd_text/text()') spd_text from dba_sql_plan_directives where directive_id in (select directive_id from dba_sql_plan_dir_objects where owner=user) and type='DYNAMIC_SAMPLING';

    STATE    INTERNAL_STATE                       SPD_TEXT 
_________ _________________ ______________________________ 
USABLE    PERMANENT         {E(DEMO.DEMO)[A, B, C, D]}     
USABLE    MISSING_STATS     {EC(DEMO.DEMO)[A, B, C, D]}    

Here we have an additional state during the learning phase because there’s something else that can be done: we are not in HAS_STATS because more stats can be gathered. We are in MISSING_STATS internal state. This is a USABLE state so that dynamic sampling continues until we gather statistics.

SQL> select count(*) c3 from demo where a=0 and b=0 and c=0 and d=0;

    C3 
______ 
   500 

SQL> select * from dbms_xplan.display_cursor(format=>'allstats last');

                                                                       PLAN_TABLE_OUTPUT 
________________________________________________________________________________________ 
SQL_ID  d8zyzh140xk0d, child number 0                                                    
-------------------------------------                                                    
select count(*) c3 from demo where a=0 and b=0 and c=0 and d=0                           
                                                                                         
Plan hash value: 2180342005                                                              
                                                                                         
-------------------------------------------------------------------------------------    
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |    
-------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     253 |    
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     253 |    
|*  2 |   TABLE ACCESS FULL| DEMO |      1 |    500 |    500 |00:00:00.01 |     253 |    
-------------------------------------------------------------------------------------    
                                                                                         
Predicate Information (identified by operation id):                                      
---------------------------------------------------                                      
                                                                                         
   2 - filter(("A"=0 AND "B"=0 AND "C"=0 AND "D"=0))                                     
                                                                                         
Note                                                                                     
-----                                                                                    
   - dynamic statistics used: dynamic sampling (level=AUTO)                              
   - 1 Sql Plan Directive used for this statement       

That can continue for a long time, with SPD in USABLE state and dynamic sampling compensating the missing stats, but at the cost of additional work during hard parse time.

SQL> exec dbms_spd.flush_sql_plan_directive;

PL/SQL procedure successfully completed.

SQL> select created,state, extract(notes,'/spd_note/internal_state/text()') internal_state, extract(notes,'/spd_note/spd_text/text()') spd_text from dba_sql_plan_directives where directive_id in (select directive_id from dba_sql_plan_dir_objects where owner=user) and type='DYNAMIC_SAMPLING' order by last_used;

    CREATED     STATE    INTERNAL_STATE                       SPD_TEXT 
___________ _________ _________________ ______________________________ 
20:52:11    USABLE    PERMANENT         {E(DEMO.DEMO)[A, B, C, D]}     
20:52:16    USABLE    MISSING_STATS     {EC(DEMO.DEMO)[A, B, C, D]}    

The status will not change until statistics gathering occurs.

SQL> exec dbms_stats.set_table_prefs(user,'DEMO','AUTO_STAT_EXTENSIONS','on');

PL/SQL procedure successfully completed.

In the same idea as adaptive statistics not enabled by default, the automatic creation of statistics extension is not there by default. I enable it for this table only here, but, as many dbms_stats operations, you can do that at schema, database or global level. This is what I do here. Usually, you do it initially when creating the table, or simply at database level because it works in pair with adaptive statistics, but in this demo I waited to show that even if the decision of going to HAS_STATS or MISSING_STATS state depends on the possibility of extended stats creation, this is done without looking at the dbms_stats preference.

SQL> exec dbms_stats.gather_table_stats(user,'DEMO', options=>'gather auto');

PL/SQL procedure successfully completed.

Note that I’m gathering the statistics like the automatic job does: GATHER AUTO. As I did not change any rows, the table statistics are not stale but the new directive in MISSING_STATS tells DBMS_STATS that there’s a reason to re-gather the statistics.

And if you look at statistics extensions there, there’s a new statistics extension on (A,B,C,D) column group.Just look at USER_STAT_EXTENSIONS.

SQL> select count(*) c4 from demo where a=0 and b=0 and c=0 and d=0;

    C4 
______ 
   500 

SQL> select * from dbms_xplan.display_cursor(format=>'allstats last');

                                                                       PLAN_TABLE_OUTPUT 
________________________________________________________________________________________ 
SQL_ID  g08m3qrmw7mgn, child number 0                                                    
-------------------------------------                                                    
select count(*) c4 from demo where a=0 and b=0 and c=0 and d=0                           
                                                                                         
Plan hash value: 2180342005                                                              
                                                                                         
-------------------------------------------------------------------------------------    
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |    
-------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     253 |    
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     253 |    
|*  2 |   TABLE ACCESS FULL| DEMO |      1 |    500 |    500 |00:00:00.01 |     253 |    
-------------------------------------------------------------------------------------    
                                                                                         
Predicate Information (identified by operation id):                                      
---------------------------------------------------                                      
                                                                                         
   2 - filter(("A"=0 AND "B"=0 AND "C"=0 AND "D"=0))                                     
                                                                                         
Note                                                                                     
-----                                                                                    
   - dynamic statistics used: dynamic sampling (level=AUTO)                              
   - 1 Sql Plan Directive used for this statement     

You may think that no dynamic sampling is needed anymore but the Adaptive Statistics mechanism is still in the learning phase: the SPD is still USABLE and the next parse will verify if MISSING_STATS can be superseded by HAS_STATS. This is what happened here.

SQL> exec dbms_spd.flush_sql_plan_directive;

PL/SQL procedure successfully completed.

SQL> select created,state, extract(notes,'/spd_note/internal_state/text()') internal_state, extract(notes,'/spd_note/spd_text/text()') spd_text from dba_sql_plan_directives where directive_id in (select directive_id from dba_sql_plan_dir_objects where owner=user) and type='DYNAMIC_SAMPLING' order by last_used;

    CREATED         STATE    INTERNAL_STATE                       SPD_TEXT 
___________ _____________ _________________ ______________________________ 
20:52:11    USABLE        PERMANENT         {E(DEMO.DEMO)[A, B, C, D]}     
20:52:16    SUPERSEDED    HAS_STATS         {EC(DEMO.DEMO)[A, B, C, D]}    

Here, SUPERSEDED means no more dynamic sampling for predicates with simple column equality on A,B,C,D because it HAS_STATS.

In the past, I mean before 12c, I often recommended enabling dynamic sampling (with optimizer_dynamic_sampling >= 4) on datawarehouses, or sessions running complex ad-hoc queries for reporting. And no dynamic sampling, creating manual statistics extensions only when required, for OLTP where we can expect less complex queries and where hard parse time may be a problem.

Now, in the same idea, I’ll rather recommend setting adaptive statistics because it has a finer grain optimization. As we see here: only one kind of predicate does dynamic sampling, and this dynamic sampling is the “adaptive” one, estimating not only single table cardinality but joins and aggregations as well. This is the USABLE (PERMANENT) one. The other, did it only temporarily until statistics extensions were automatically created, SUPERSEDED with HAS_STATS.

In summary, MISSING_STATS state is seen only when, given the simple column equality, there are possible statistics that are missing. And HAS_STATS means that all the statistics that can be used by optimizer for this predicate are available and no more can be gathered. Each directive will go through HAS_STATS during the learning phase. And then, it stays in HAS_STATS or switches definitely to PERMANENT state when HAS_STAT encountered misestimate again.