Infrastructure at your Service

Clemens Bleile

Elapsed time of Oracle Parallel Executions are not shown correctly in AWR

As the elapsed time (time it takes for a task from start to end, often called wall-clock time) per execution of parallel queries are not shown correctly in AWR-reports, I thought I setup a testcase to find a way to get an elapsed time closer to reality.

REMARK: To use AWR (Automatic Workload Repository) and ASH (Active Session History) as described in this Blog you need to have the Oracle Diagnostics Pack licensed.

I created a table t5 with 213K blocks:

SQL> select blocks from tabs where table_name='T5';
 
    BLOCKS
----------
    213064

In addition I enabled Linux-IO-throttling with 300 IOs/sec through a cgroup on my device sdb to ensure the parallel-statements take a couple of seconds to run:

[[email protected] ~]# CONFIG_BLK_CGROUP=y
[[email protected] ~]# CONFIG_BLK_DEV_THROTTLING=y
[[email protected] ~]# echo "8:16 300" > /sys/fs/cgroup/blkio/blkio.throttle.read_iops_device

After that I ran my test:

SQL> select sysdate from dual;
 
SYSDATE
-------------------
14.11.2019 14:03:51
 
SQL> exec dbms_workload_repository.create_snapshot;
 
PL/SQL procedure successfully completed.
 
SQL> set timing on
SQL> select /*+ parallel(t5 2) full(t5) */ count(*) from t5;
 
  COUNT(*)
----------
  50403840
 
Elapsed: 00:00:05.63
SQL> select /*+ parallel(t5 2) full(t5) */ count(*) from t5;
 
  COUNT(*)
----------
  50403840
 
Elapsed: 00:00:05.62
SQL> select /*+ parallel(t5 2) full(t5) */ count(*) from t5;
 
  COUNT(*)
----------
  50403840
 
Elapsed: 00:00:05.84
SQL> select /*+ parallel(t5 2) full(t5) */ count(*) from t5;
 
  COUNT(*)
----------
  50403840
 
Elapsed: 00:00:05.73
SQL> select /*+ parallel(t5 2) full(t5) */ count(*) from t5;
 
  COUNT(*)
----------
  50403840
 
Elapsed: 00:00:05.63
SQL> select /*+ parallel(t5 2) full(t5) */ count(*) from t5;
 
  COUNT(*)
----------
  50403840
 
Elapsed: 00:00:05.74
SQL> exec dbms_workload_repository.create_snapshot;
 
PL/SQL procedure successfully completed.

Please consider the elapsed time of about 5.7 seconds per execution.

The AWR-report shows the following in the “SQL ordered by Elapsed time”-section:

        Elapsed                  Elapsed Time
        Time (s)    Executions  per Exec (s)  %Total   %CPU    %IO    SQL Id
---------------- -------------- ------------- ------ ------ ------ -------------
            67.3              6         11.22   94.5   37.4   61.3 04r3647p2g7qu
Module: SQL*Plus
select /*+ parallel(t5 2) full(t5) */ count(*) from t5

I.e. 11.22 seconds in average per execution. However, as we can see above, the average execution time is around 5.7 seconds. The reason for the wrong elapsed time per execution is that the elapsed time for the parallel slaves is summed up to the elapsed time even though the processes worked in parallel. Thanks to the column SQL_EXEC_ID (very useful) we can get the sum of the elapsed times per execution from ASH:

SQL> break on report
SQL> compute avg of secs_db_time on report
SQL> select sql_exec_id, qc_session_id, qc_session_serial#, count(*) secs_db_time from v$active_session_history
  2  where sql_id='04r3647p2g7qu' and sample_time>to_date('14.11.2019 14:03:51','dd.mm.yyyy hh24:mi:ss')
  3  group by sql_exec_id, qc_session_id, qc_session_serial#
  4  order by 1;
 
SQL_EXEC_ID QC_SESSION_ID QC_SESSION_SERIAL# SECS_DB_TIME
----------- ------------- ------------------ ------------
   16777216	      237                  16626           12
   16777217	      237                  16626           12
   16777218	      237                  16626           10
   16777219	      237                  16626           12
   16777220	      237                  16626           10
   16777221	      237                  16626           10
                                             ------------
avg                                                    11
 
6 rows selected.

I.e. the 11 secs correspond to the 11.22 secs in the AWR-report.

How do we get the real elapsed time for the parallel queries? If the queries take a couple of seconds we can get the approximate time from ASH as well by subtracting the sample_time at the beginning from the sample_time at the end of each execution (SQL_EXEC_ID):

SQL> select sql_exec_id, extract (second from (max(sample_time)-min(sample_time))) secs_elapsed 
  2  from v$active_session_history
  3  where sql_id='04r3647p2g7qu'
  4  and sample_time>to_date('14.11.2019 14:03:51','dd.mm.yyyy hh24:mi:ss')
  5  group by sql_exec_id
  6  order by 1;
 
SQL_EXEC_ID SECS_ELAPSED
----------- ------------
   16777216         5.12
   16777217        5.104
   16777218         4.16
   16777219        5.118
   16777220        4.104
   16777221        4.171
 
6 rows selected.

I.e. those numbers reflect the real execution time much better.

REMARK: If the queries take minutes (or hours) to run then you have to extract the minutes (and hours) as well of course. See also the example I have at the end of the Blog.

The info in V$SQL is also not very helpful:

SQL> set lines 200 pages 999
SQL> select child_number, plan_hash_value, elapsed_time/1000000 elapsed_secs, 
  2  executions, px_servers_executions, last_active_time 
  3  from v$sql where sql_id='04r3647p2g7qu';
 
CHILD_NUMBER PLAN_HASH_VALUE ELAPSED_SECS EXECUTIONS PX_SERVERS_EXECUTIONS LAST_ACTIVE_TIME
------------ --------------- ------------ ---------- --------------------- -------------------
           0      2747857355    67.346941          6                    12 14.11.2019 14:05:17

I.e. for the QC we have the column executions > 0 and for the parallel slaves we have px_servers_executions > 0. You may actually get different child cursors for the Query Coordinator and the slaves.

So in theory we should be able to do something like:

SQL> select child_number, (sum(elapsed_time)/sum(executions))/1000000 elapsed_time_per_exec_secs 
  2  from v$sql where sql_id='04r3647p2g7qu' group by child_number;
 
CHILD_NUMBER ELAPSED_TIME_PER_EXEC_SECS
------------ --------------------------
           0                 11.2244902

Here we do see the number from the AWR again.

So in future be careful when checking the elapsed time per execution of statements, which ran with parallel slaves. The number will be too high in AWR or V$SQL. Further analysis to get the real elapsed time per execution would be necessary.

REMARK: As the numbers in AWR do come from e.g. dba_hist_sqlstat, the following query provides “wrong” output for parallel executions as well:

SQL> column begin_interval_time format a32
SQL> column end_interval_time format a32
SQL> select begin_interval_time, end_interval_time, ELAPSED_TIME_DELTA/1000000 elapsed_time_secs, 
  2  (ELAPSED_TIME_DELTA/EXECUTIONS_DELTA)/1000000 elapsed_per_exec_secs
  3  from dba_hist_snapshot snap, dba_hist_sqlstat sql 
  4  where snap.snap_id=sql.snap_id and sql_id='04r3647p2g7qu' 
  5  and snap.BEGIN_INTERVAL_TIME > to_date('14.11.2019 14:03:51','dd.mm.yyyy hh24:mi:ss');
 
BEGIN_INTERVAL_TIME              END_INTERVAL_TIME                ELAPSED_TIME_SECS ELAPSED_PER_EXEC_SECS
-------------------------------- -------------------------------- ----------------- ---------------------
14-NOV-19 02.04.00.176 PM        14-NOV-19 02.05.25.327 PM                67.346941            11.2244902

To take another example I did run a query from Jonathan Lewis from
https://jonathanlewis.wordpress.com/category/oracle/parallel-execution:

SQL> @jonathan
 
19348 rows selected.
 
Elapsed: 00:06:42.11

I.e. 402.11 seconds

AWR shows 500.79 seconds:

        Elapsed                  Elapsed Time
        Time (s)    Executions  per Exec (s)  %Total   %CPU    %IO    SQL Id
---------------- -------------- ------------- ------ ------ ------ -------------
           500.8              1        500.79   97.9   59.6   38.6 44v4ws3nzbnsd
Module: SQL*Plus
select /*+ parallel(t1 2) parallel(t2 2)
 leading(t1 t2) use_hash(t2) swa
p_join_inputs(t2) pq_distribute(t2 hash hash) ca
rdinality(t1,50000) */ t1.owner, t1.name, t1.typ

Let’s check ASH with the query I used above (this time including minutes):

select sql_exec_id, extract (minute from (max(sample_time)-min(sample_time))) minutes_elapsed,
extract (second from (max(sample_time)-min(sample_time))) secs_elapsed 
from v$active_session_history
where sql_id='44v4ws3nzbnsd'
group by sql_exec_id
order by 1;
 
SQL_EXEC_ID MINUTES_ELAPSED SECS_ELAPSED
----------- --------------- ------------
   16777216	              6       40.717

I.e. 06:40.72 which is close to the real elapsed time of 06:42.11

2 Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Clemens Bleile
Clemens Bleile

Technology Leader & Principal Consultant