By Franck Pachot

.
I can hear a lot of complaints about the instability coming from the adaptive features introduced into the optimizer logic at each release.
Giving more intelligence to the optimizer is very good to improve the response time for generated queries, BI, ad-hoc reporting.
But when you have an OLTP that works for years with its static set of queries, then you don’t appreciate the plan instability coming from (in reverse chronological order and not exhaustive): SPD, ADS, cardinality feedback, ‘size auto’ histograms, bind variable peeking, CBO, etc.

We can discuss about that, and I may agree, but my first question will be:
– do you use SQL Plan Baselines?


It’s totally correct to care about plan stability. How many times I’ve been doing some tuning for a customer that tells me: “I don’t care if the query is long. We can accept it or tune it. But when the response time goes from 1 second to 1 hour at random, I can’t even test if I can improve it or not.”

Yes, plan in-stability is a problem. But the solution is not:

  • /*+ RULE */
  • optimizer_index_cost_adj=10
  • _optim_peek_user_binds=false
  • _optimizer_adaptive_cursor_sharing=false
  • _optimizer_use_feedback=false
  • optimizer_adaptive_features=false
  • optimizer_features_enable=8.0.0

(and no, this is not a copy/paste from a SAP note…)

The solution is:

  • SQL Plan Baselines if you are in Enterprise Edition (no option needed when you evolve them manually)
  • Stored outlines (deprecated, but the only way in Standard Edition)

Static SQL

It’s not new. When I started to work on databases, I was using DB2 (the v1 on Unix) and Oracle 7.

With DB2, when the application code was compiled, the queries were optimized at that time (with statistics from database – it’s cost based optimization) and the execution plan was stored into the database (similar to a stored procedure) at deploy time. It’s called the ‘bind process’.
Yes, bind variables come at execution time but the application can be bound to the database at deployment time. This is plan stability for Static SQL.

In Oracle, we were in RULE optimizer at that time, so the plan stability was there: the rules did not depend on data. Same query on same structure always give the same plan. Probably for this reason, Oracle has never implemented Static SQL. Even embedded SQL pre-processed at compilation are not optimized at compile time. Only syntax and semantic is checked. Oracle always process the Static SQL as Dynamic SQL: they are optimized at runtime. Not at each parse call, because it’s cached, but it’s cached only in memory (library cache). And at any point, because of invalidation, space pressure on shared pool, instance restart, etc. the SQL can be hard parsed again.

No problem with RULE. But when Oracle introduced the CBO – Cost Based Optimizer – then things changed. All SQL are considered as Dynamic SQL, they can be optimized at any time, and because the cost depends on data (statistics) and lot of parameters, the plans can change.

And people thought that Oracle didn’t care about plan instability because they introduced more and more parameters that can make the plan change: bind variable peeking, rolling invalidation, cardinality feedback, auto mechanisms and adaptive features. But no, the solution to store execution plans was there since Oracle 8: it’s the outlines. Outlines associate a set of hints that limit the choice of the optimizer in order to get always the same plan. Outlines fixed only one possible plan. Now SQL Plan Baselines goes further: all plans are stored and you can choose which ones are allowed to be used.

Capture

There are several ways to capture SQL Plan Baselines (capture automatically all statements that have more than 2 executions, capture from library cache with different criteria, or from SQL Tuning Set, or manually by sql_id). But be careful, because – depending on your application design – you may have a lot of statements captured.

And don’t capture a lot of statements at a time because cursors will be invalidated and you risk to have a peak of hard parsing that follow the bulk capture.
Here is an example to se the ‘reason’ of invalidation:
I have a statement that is executed very often:


SQL> select sql_id,child_number,executions,invalidations from v$sql where sql_id='2zgczymdyvgmq'
SQL_ID        CHILD_NUMBER EXECUTIONS INVALIDATIONS
------------- ------------ ---------- -------------
2zgczymdyvgmq            0   98794435             0 

I capture the SQL Plan Baseline for it:


SQL> variable loaded number
SQL> exec :loaded:=dbms_spm.load_plans_from_cursor_cache('2zgczymdyvgmq')
anonymous block completed
SQL> print loaded
LOADED
-
1

and after a few seconds, a new child cursor appears:


SQL> select sql_id,child_number,executions,invalidations,sql_plan_baseline from v$sql where sql_id='2zgczymdyvgmq'
SQL_ID        CHILD_NUMBER EXECUTIONS INVALIDATIONS SQL_PLAN_BASELINE             
------------- ------------ ---------- ------------- ------------------------------
2zgczymdyvgmq            0   98794462             0                                
2zgczymdyvgmq            1        501             1 SQL_PLAN_56gpu0marthcwcc4c47e7 

the reason from V$SQL_SHARED_CURSOR is:


<ChildNode><ChildNumber>0</ChildNumber><ID>4</ID><reason>SQL Tune Base Object Different(3)</reason><size>2x4</size><flags_kestbcci>7</flags_kestbcci><ehash_kestbcci>3581723036</ehash_kestbcci></ChildNode> 

control the capture

So my recommendation here is to control the capture. You will have to manage them (see why the plan change, evolve them, etc). They are stored in SYSAUX. The can be purged after retention, but if you captured a lot at the same time, then the purge will take a lot of time (and undo records).

So you should control the capture. Here is an example to capture the Top-100 statements by execution count. You can review them, capture 100 more if you thing it makes sense, wait a while to see how many have been executed again in the following days, how many have new possible plans, etc.


set serveroutput on
variable loaded number
exec for i in (select * from (select * from (select sql_id,exact_matching_signature from v$sql where plan_hash_value>0 and sql_plan_baseline is null and last_active_time>sysdate-1/24 group by sql_id,exact_matching_signature order by count(executions) desc) where rownum<=100) where exact_matching_signature not in (select signature from dba_sql_plan_baselines)) loop :loaded:=nvl(:loaded,0)+dbms_spm.load_plans_from_cursor_cache(i.sql_id); end loop;
print loaded

It’s just an example. I choose to capture the statements that have the most executions since startup. You can add other criteria.

But you should not capture only those that have a large elapsed time, or from a STS coming from AWR. The goal is different here. AWR is focused high resource consumption because the goal is to tune the ‘bad’ statements. But here the goal is to keep the ‘good’ statement so that they will never be ‘bad’.

So it makes sense to capture only the good statements. Don’t run the capture at a time where users complain about response time.

So what?

The core message here is:

  • Stop to complain about oracle Plan Instability if you didn’t give a try at SQL Plan Baselines (or outlines in SE)
  • When the system is going well, don’t wait for the next performance issue. This is the time to fix the plans that are going well.
  • Don’t fear migrations. Stabilize the most critical use-cases with SQL Plan Baselines and go on.
  • If you’re an ERP vendor, stop to fake optimizer with old parameters. Deploy the execution plans with your application

I know that SQL Plan Baselines are not widely used. Mostly because we don’t find time for this pro-active activity. And because it requires a good communication between dev and ops. But remember that Oracle has provided plan stability features for a long time, and they think we use it when they introduce all adaptive features.