Infrastructure at your Service

Franck Pachot

Result Cache: when *not* to use it

By January 29, 2018 Oracle One Comment

By Franck Pachot

.
I encountered recently a case where result cache was incorrectly used, leading to high contention when the application encountered a peak of load. It was not a surprise when I’ve seen that the function was called with an ‘ID’ as argument, which may have thousands of values in this system. I mentioned to the software vendor that the result cache must be used only for frequently calling the function with same arguments, not for random values, even if each value have 2 or 3 identical calls. And, to detail this, I looked at the Oracle Documentation to link the part which explains when the result cache can be used and when it should be avoided.

But I’ve found nothing relevant. This is another(*) case where the Oracle Documentation is completely useless. Without explaining how a feature works, you completely fail to get this feature used. Most people will not take the risk to use it, and a few will use it in the wrong place, before definitely blacklisting this feature.

(*) By another case, I’m thinking about Kamil Stawiarski presentation about Pragma UDF and the lack of useful documentation about it.

Oracle documentation

So this is what I’ve find in the Database Performance Tuning Guide about the Benefits of Using the Server Result Cache

  1. The benefits of using the server result cache depend on the application
  2. OLAP applications can benefit significantly from its use.
  3. Good candidates for caching are queries that access a high number of rows but return a small number, such as those in a data warehouse.

So, this is vague (‘depends’, ‘can benefit’, ‘good candidates’). And doesn’t help to decide when it can be used.
The ‘access a high number of rows but return a small number’ is an indication why cache hits can benefit. However, there is no mention of the most important things, which are :

  • The cache result is invalidated for any DML on the tables the result relies on.
  • The cache miss, when the result is invalidated is expensive
  • The cache miss, when the result is not in the result cache is expensive
  • The ‘expensive’ here is a scalability issue: not detected in unit tests, but big contention when load increases

Real things to know

The first thing to know is that the Result Cache memory is protected by a latch:

SQL> select addr,name,gets,misses,sleeps,spin_gets,wait_time from v$latch where name like 'Result Cache%';
 
ADDR NAME GETS MISSES SLEEPS SPIN_GETS WAIT_TIME
---------------- ------------------------- ---------- ---------- ---------- ---------- ----------
00000000600477D0 Result Cache: RC Latch 2 0 0 0 0
0000000060047870 Result Cache: SO Latch 0 0 0 0 0
0000000060047910 Result Cache: MB Latch 0 0 0 0 0

This latch has no children:

SQL> select * from v$latch_children where name like '%Result Cache%';
 
no rows selected

Only one latch to protect the whole result cache: concurrent sessions – even for different functions – have to serialize their access on the same latch.

This latch is acquired in exclusive mode when the session has to write to the result cache (cache miss, invalidation,…) or in shared mode – since 11gR2 when reading only. This has been explained by Alex Fatkulin http://afatkulin.blogspot.ch/2012/05/result-cache-latch-in-11gr2-shared-mode.html.

This means that, whatever the Oracle Documentation says, the benefit of result cache comes only at cache hit: when the result of the function is already there, and has not been invalidated. If you call the same function with always the same parameter, frequently, and with no changes in the related tables, then we are in the good case.

But if there was a modification of one of the tables, even some rows that have nothing to do with the result, then you will have an overhead: exclusive latch get. And if you call the function with new values for the arguments, that’s also a cache miss which has to get this exclusive latch. And if you have multiple sessions experiencing a cache miss, then they will spin on CPU to get the exclusive latch. This can be disastrous with a large number of sessions. I have seen this kind of contention for hours with connection pools set to 100 sessions when the call to the function is frequent with different values.

To show it, I create a demo table (just to have a dependency) and a result_cache function:

SQL> create table DEMO as select rownum n from xmltable('1 to 1000');
Table created.
 
SQL> create or replace function F(n number) return number result_cache as begin for i in (select * from DEMO where DEMO.n=F.n) loop return i.n; end loop; end;
2 /
Function created.

I have just restarted the instance and my latch statistics are reset:

SQL> select addr,name,gets,misses,sleeps,spin_gets,wait_time from v$latch where name like 'Result Cache%';
 
ADDR NAME GETS MISSES SLEEPS SPIN_GETS WAIT_TIME
---------------- ------------------------- ---------- ---------- ---------- ---------- ----------
00000000600477D0 Result Cache: RC Latch 2 0 0 0 0
0000000060047870 Result Cache: SO Latch 0 0 0 0 0
0000000060047910 Result Cache: MB Latch 0 0 0 0 0

Result Cache Hit

This will call the function always with the same argument, and no change in the table it relies on:
SQL> declare n number; begin for i in 1..1e3 loop n:=n+f(1); end loop; end;
2 /
PL/SQL procedure successfully completed.

So, the first call is a cache miss and the 999 next calls are cache hits. This is the perfect case for Result Cache.

SQL> select addr,name,gets,misses,sleeps,spin_gets,wait_time from v$latch where name like 'Result Cache%';
 
ADDR NAME GETS MISSES SLEEPS SPIN_GETS WAIT_TIME
---------------- ------------------------- ---------- ---------- ---------- ---------- ----------
00000000600477D0 Result Cache: RC Latch 1009 0 0 0 0
0000000060047870 Result Cache: SO Latch 1 0 0 0 0
0000000060047910 Result Cache: MB Latch 0 0 0 0 0

So, that’s about 1000 latch gets. With cache hits you get the latch once per execution, and this is a shared latch, so no contention here.
You want to see check that it is a shared latch? Just set a breakpoint with gdb on the ksl_get_shared_latch function (up to 12.1 because 12.2 uses ksl_get_shared_latch_int) and print the arguments (as explained by Stefan Koehler and Frits Hoogland):

As my RC latch is at address 00000000600477D0 I set a beakpoint on ksl_get_shared_latch where the first argument is 0x600477d0 and display the other arguments:

break ksl_get_shared_latch
condition 1 $rdi == 0x600477d0
commands
silent
printf "ksl_get_shared_latch laddr:%x, willing:%d, where:%d, why:%d, mode:%d\n", $rdi, $rsi, $rdx, $rcx, $r8
c
end

Then one call with cache hit displays:

ksl_get_shared_latch laddr:600477d0, willing:1, where:1, why:5358, mode:8

Mode 8 is shared: many concurrent sessions can do the same without waiting. Shared is scalable: cache hits are scalable.

Cache miss – result not in cache

Here each call will have a different value for the argument, so that they are all cache misses (except the first one):

SQL> declare n number; begin for i in 1..1e3 loop n:=n+f(i); end loop; end;
2 /
PL/SQL procedure successfully completed.

Now the ‘RC latch’ statistics have increased further:

SQL> select addr,name,gets,misses,sleeps,spin_gets,wait_time from v$latch where name like 'Result Cache%';
 
ADDR NAME GETS MISSES SLEEPS SPIN_GETS WAIT_TIME
---------------- ------------------------- ---------- ---------- ---------- ---------- ----------
00000000600477D0 Result Cache: RC Latch 6005 0 0 0 0
0000000060047870 Result Cache: SO Latch 1 0 0 0 0
0000000060047910 Result Cache: MB Latch 0 0 0 0 0

This is about 5000 additional latch gets, which means 5 per execution. And, because it writes, you can expect them to be exclusive.

Here is my gdb script output when I call the function with a value that is not already in cache:

ksl_get_shared_latch laddr:600477d0, willing:1, where:1, why:5358, mode:8
ksl_get_shared_latch laddr:600477d0, willing:1, where:1, why:5347, mode:16
ksl_get_shared_latch laddr:600477d0, willing:1, where:1, why:5358, mode:16
ksl_get_shared_latch laddr:600477d0, willing:1, where:1, why:5374, mode:16

Mode 16 is exclusive. And we have 3 of them in addition to the shared one. You can imagine what happens when several sessions are running this: spin and wait, all sessions on the same resource.

Cache miss – result in cache but invalid

I run the same again, where all values are in cache now:

SQL> declare n number; begin for i in 1..1e3 loop n:=n+f(i); end loop; end;
2 /
PL/SQL procedure successfully completed.

So this is only 1000 additional gets:

SQL> select addr,name,gets,misses,sleeps,spin_gets,wait_time from v$latch where name like 'Result Cache%';
 
ADDR NAME GETS MISSES SLEEPS SPIN_GETS WAIT_TIME
---------------- ------------------------- ---------- ---------- ---------- ---------- ----------
00000000600477D0 Result Cache: RC Latch 7005 0 0 0 0
0000000060047870 Result Cache: SO Latch 1 0 0 0 0
0000000060047910 Result Cache: MB Latch 0 0 0 0 0

The function depends on DEMO table, and I do some modifications on it:

SQL> insert into DEMO values (0)
1 row created.
SQL> commit;
Commit complete.

This has invalidated all previous results. A new run will have all cache miss:

SQL> declare n number; begin for i in 1..1e3 loop n:=n+f(i); end loop; end;
2 /
PL/SQL procedure successfully completed.

And this is 5000 additional gets:

SQL> select addr,name,gets,misses,sleeps,spin_gets,wait_time from v$latch where name like 'Result Cache%';
 
ADDR NAME GETS MISSES SLEEPS SPIN_GETS WAIT_TIME
---------------- ------------------------- ---------- ---------- ---------- ---------- ----------
00000000600477D0 Result Cache: RC Latch 12007 0 0 0 0
0000000060047870 Result Cache: SO Latch 1 0 0 0 0
0000000060047910 Result Cache: MB Latch 0 0 0 0 0

So what?

The important thing to know is that each cache miss requires an exclusive access to the Result Cache, multiple times. Those must be avoided. The Result Cache is good for a static set of result. It is not a short-term cache to workaround an application design where the function is called two or three times with the same values. This is, unfortunately, not explained in the Oracle Documentation. But it becomes obvious when we look at the implementation, or when we load test it with multiple sessions. The consequence can be this kind of high contention during minutes or hours:

Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time
----------------------------------------- ------------ ----------- ------ ------
latch free 858,094 1,598,387 1863 78.8
enq: RC - Result Cache: Contention 192,855 259,563 1346 12.8

Without either the knowledge of the implementation, or relevant load tests, the risk is that a developer stays on his good results in unit testing, and implement Result Cache in each function. The consequence will be seen too late, in production, at a time of load peak. If this happens to you, you can disable the result cache (DBMS_RESULT_CACHE.BYPASS(TRUE);) but the risk is to have performance degradation in the ‘good cases’. Or recompile the procedures with removed RESULT_CACHE, but you may bring a new contention on library cache then.

 

One Comment

Leave a Reply

Franck Pachot
Franck Pachot

Technology Leader