dbi Blog

From an application point of view, the oracle scheduler DBMS_SCHEDULER allows to reach best performance by parallelizing your process.

Let’s start with the following PL/SQL code inserting in serial several rows from a metadata table to a target table. In my example, the metadata table does not contain “directly” the data but a set a of sql statement to be executed and for which the rows returned must be inserted into the target table My_Target_Table_Serial :

Let’s verify the contents of the source table called My_Metadata_Table:

SQL> SELECT priority,dwh_id, amq_name, sql_statement,scope from dwh_amq_v2;
ROWNUM  DWH_ID  AMQ_NAME SQL_STATEMENT          SCOPE
1	7	AAA1	 SELECT SUM(P.age pt.p	TYPE1
2	28	BBB2  	 SELECT CASE WHEN pt.p	TYPE1
3	37	CCC3	 "select cm.case_id fr"	TYPE2
4	48	DDD4	 "select cm.case_id fr"	TYPE2
5	73	EEE5	 SELECT DISTINCT pt.p	TYPE1
6	90	FFF6 	 SELECT LAG(ORW pt.p	TYPE1
7	114	GGG7	 SELECT distinct pt.	TYPE1
8	125	HHH8	 SELECT DISTINCT pt.p	TYPE1
...
148    115     ZZZ48    SELECT ROUND(TO_NUMBER TYPE2

Now let’s check the PL/SQL program :

DECLARE
  l_errm VARCHAR2(200);
  l_sql  VARCHAR2(32767) := NULL;
  sql_statement_1  VARCHAR2(32767) := NULL;
  sql_statement_2  VARCHAR2(32767) := NULL;
  l_amq_name VARCHAR2(200);
  l_date NUMBER;
BEGIN
  SELECT TO_NUMBER(TO_CHAR(SYSDATE,'YYYYMMDDHH24MISS')) INTO l_date FROM dual;
  FOR rec IN (SELECT dwh_id, amq_name, sql_statement,scope 
                FROM My_Metadata_Table,
                     (SELECT dwh_pit_date FROM dwh_code_mv) pt
               WHERE dwh_status = 1
                 AND (pt.dwh_pit_date >= dwh_valid_from AND pt.dwh_pit_date < dwh_valid_to) 
               ORDER BY priority, dwh_id) LOOP
    ...
    sql_statement_1 := substr(rec.sql_statement, 1, 32000);
    sql_statement_2 := substr(rec.sql_statement, 32001);
    IF rec.SCOPE = 'TYPE1' THEN 
      -- TYPE1 LEVEL SELECT
      l_sql := 'INSERT /*+ APPEND */ INTO My_Target_Table_Serial (dwh_pit_date, AMQ_ID, AMQ_TEXT, CASE_ID, ENTERPRISE_ID)'||CHR(13)|| 'SELECT DISTINCT TO_DATE(code.dwh_pit_date, ''YYYYMMDDHH24MISS''),'||rec.dwh_id|| ',''' ||rec.amq_name ||''', case_id, 1'||CHR(13)
      || ' FROM (SELECT dwh_pit_date FROM dwh_code) code, ('||sql_statement_1;
      EXECUTE IMMEDIATE l_sql || sql_statement_2 || ')';
      COMMIT;    
    ELSE 
      -- TYPE2 LEVEL SELECT
      l_sql :=  'INSERT /*+ APPEND */ INTO My_Target_Table_Serial (dwh_pit_date, AMQ_ID, AMQ_TEXT, CASE_ID, ENTERPRISE_ID)
      SELECT DISTINCT TO_DATE(code.dwh_pit_date, ''YYYYMMDDHH24MISS''), '||rec.dwh_id|| ',''' ||rec.amq_name || ''', cm.case_id, cm.enterprise_id'||CHR(13)
      || '  FROM (SELECT dwh_pit_date FROM dwh_code) code, v_sc_case_master cm, v_sc_case_event ce, ('||sql_statement_1;
              
      EXECUTE IMMEDIATE l_sql || sql_statement_2 || ') pt'||CHR(13)
      || ' WHERE cm.case_id = ce.case_id'||CHR(13) 
      || '   AND cm.deleted IS NULL AND cm.state_id <> 1'||CHR(13)
      || '   AND ce.deleted IS NULL AND ce.pref_term = pt.pt_name';
      COMMIT;         
    END IF;
    ...
   END LOOP:
END;
Number of Rows Read : 148 (Means 148 Sql Statement to execute)
START : 16:17:46
END : 16:57:42
Total :  40 mins

As we can see, each Sql Statement is executed in serial, let’s check the audit table recording the loading time (Insert Time) and the “scheduling” :

CREATE_DATE		NAME	START_DATE		END_DATE            LOADING_TIME
22.05.2020 16:46:34	AAA1	22.05.2020 16:46:34	22.05.2020 16:57:42    11.08mins
22.05.2020 16:42:05	BBB2	22.05.2020 16:42:05	22.05.2020 16:46:34    04.29mins
22.05.2020 16:41:15	CCC3	22.05.2020 16:41:15	22.05.2020 16:42:05    50sec
22.05.2020 16:40:42	DDD4	22.05.2020 16:40:42	22.05.2020 16:41:15    32sec
22.05.2020 16:40:20	EEE5	22.05.2020 16:40:20	22.05.2020 16:40:42    22sec
22.05.2020 16:37:23	FFF6	22.05.2020 16:37:23	22.05.2020 16:40:20    02.57mins
22.05.2020 16:37:12	GGG7	22.05.2020 16:37:12	22.05.2020 16:37:23    11sec
...
22.05.2020 16:36:03	ZZZ148	22.05.2020 16:17:35	22.05.2020 16:17:46    11sec

To resume :

The 148 rows (148 Sql Statement) coming from the source table are loaded in serial in 40mins.
The majority of rows have taken less than 01 min to load (Ex. : Name = CCC3,DDD4,EEE5,GGG7 and ZZZ148)
Few rows have taken more than a couple of minutes to load.
The maximum loading time is 11.08mins for the Name “AA1”.
Each row must wait the previous row complete his loading before to start his loading (compare END_DATE previous vs START_DATE current).

To optimize the process, let’s trying to load all the rows coming from the source table in parallel by using the oracle scheduler DBMS_SCHEDULER.

Instead to execute directly the Insert command in the loop, let’s create a job through DBMS_SCHEDULER:

FOR rec IN (SELECT priority,dwh_id, amq_name, sql_statement,scope 
                FROM My_Metadata_Table,
                     (SELECT dwh_pit_date FROM dwh_code_mv) pt
               WHERE dwh_status = 1
                 AND (pt.dwh_pit_date >= dwh_valid_from AND pt.dwh_pit_date < dwh_valid_to) 
               ORDER BY priority, dwh_id) LOOP

     l_amq_name := rec.amq_name;
       IF rec.SCOPE = 'TYPE1' THEN 
        -- TYPE1 LEVEL SELECT
         ...
  
            --Execute Job to insert the AMQ : Background process
            DBMS_SCHEDULER.CREATE_JOB (
            job_name             => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
            job_type             => 'PLSQL_BLOCK',
            job_action           => 'BEGIN
                                      LOAD_DATA(''CASE'','||''''||l_amq_name||''''||','||rec.priority||','||l_date||','||v_SESSION_ID||','||i||');
                                     END;',
            start_date    =>  sysdate,  
            enabled       =>  TRUE,  
            auto_drop     =>  TRUE,  
            comments      =>  'job for amq '||l_amq_name);
          END IF;
        ELSE 
            ...
            END IF;
        END IF; 
      i := i +1;
  END LOOP;
Number of Rows Read : 148 (Means 148 Sql Statement to execute)
START : 08:14:03
END : 08:42:32
Total :  27.57 mins

To resume :

The 148 rows (148 Sql Statement) coming from the source table are loaded now in parallel in 27.57mins instead of 40mins in serial.
The options of DBMS_SCHEDULER are :
- As we are limited in number of character for the parameter “job_action”, we have to insert the data through a PL/SQL procedure LOAD_DATA.
- The job is executed immediately (start_date=sysdate) and purged immediately after his execution (auto_drop=TRUE).

Let’s check now how the jobs are scheduled. Since we do a loop of 148 times, I expect to have 148 jobs:

First, let’s check now if the rows (Remember, One Row = One Insert Into Target Table From Source Table) are loaded in parallel :

CREATE_DATE 	    NAME START_DATE 	        END_DATE 				       
22.05.2020 16:46:34 AAA1 23.05.2020 08:14:04	23.05.2020 08:21:19
22.05.2020 16:42:05 BBB2 23.05.2020 08:14:04	23.05.2020 08:20:43
22.05.2020 16:41:15 CCC3 23.05.2020 08:14:04	23.05.2020 08:21:59
22.05.2020 16:40:42 DDD4 23.05.2020 08:14:03	23.05.2020 08:15:29
22.05.2020 16:40:20 EEE5 23.05.2020 08:14:03	23.05.2020 08:15:05
22.05.2020 16:37:23 FFF6 23.05.2020 08:14:03	23.05.2020 08:14:47
22.05.2020 16:37:12 GGG7 23.05.2020 08:14:03	23.05.2020 08:15:59
...                     
22.05.2020 16:36:03 ZZZ148 22.05.2020 16:17:35 22.05.2020 16:17:46

This is the case, all rows have the same start_date, meaning all rows start in parallel. Let’s verify into “all_scheduler_job_run_details” to check we have our 148 jobs in parallel :

SQL> select count(*) from all_scheduler_job_run_details where job_name like '%20200523081403';

  COUNT(*)
----------
       148
SQL> select log_date,job_name,status,req_start_date from all_scheduler_job_run_details where job_name like '%20200523081403';
LOG_DATE		JOB_NAME		        STATUS		REQ_START_DATE
23-MAY-20 08.42.41	AMQ_P3J147_20200523081403	SUCCEEDED	23-MAY-20 02.42.32
23-MAY-20 08.42.32	AMQ_P2J146_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
23-MAY-20 08.37.56	AMQ_P2J145_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
23-MAY-20 08.37.33	AMQ_P2J144_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
23-MAY-20 08.37.22	AMQ_P2J143_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
23-MAY-20 08.37.03	AMQ_P2J141_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
23-MAY-20 08.36.50	AMQ_P2J142_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
23-MAY-20 08.33.57	AMQ_P2J140_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
--Only the first 8 rows are displayed

To resume :

We have 148 jobs all started, most of the time in parallel (job with same REQ_START_DATE, oracle parallelizes jobs per block randomly).
My PL/SQL process now took 27.57 mins instead of 40mins.

But if we have a look in details, we have a lot of small jobs. Those are jobs where run_duration is less than 01 mins:

SQL> select run_duration from all_scheduler_job_run_details where job_name like '%20200523081403' order by run_duration;

RUN_DURATION
+00 00:00:04.000000
+00 00:00:07.000000
+00 00:00:09.000000
+00 00:00:10.000000
+00 00:00:13.000000
+00 00:00:15.000000
+00 00:00:20.000000
+00 00:00:27.000000
+00 00:00:33.000000
+00 00:00:35.000000
+00 00:00:36.000000
+00 00:00:38.000000
+00 00:00:43.000000
+00 00:00:46.000000
+00 00:00:51.000000
+00 00:00:52.000000

As we have a lot of small jobs (short-lived jobs), it will be more interesting to use lightweight jobs instead of regular jobs.

In contrary of regular jobs, lightweight jobs :

Require less meta data, so they have quicker create and drop times.
Suited for short-lived jobs (small jobs, jobs where run_duration is low).

Let’s rewrite our PL/SQL process using lightweight jobs :

To use lightweight jobs, first create a program suitable for a lightweight job :

begin
dbms_scheduler.create_program
(
    program_name=>'LIGHTWEIGHT_PROGRAM',
    program_action=>'LOAD_AMQ',
    program_type=>'STORED_PROCEDURE',
    number_of_arguments=>6, 
    enabled=>FALSE);
END;

Add the arguments (parameters) and enable the program :

BEGIN
dbms_scheduler.DEFINE_PROGRAM_ARGUMENT(
program_name=>'lightweight_program',
argument_position=>1,
argument_type=>'VARCHAR2',
DEFAULT_VALUE=>NULL);

dbms_scheduler.DEFINE_PROGRAM_ARGUMENT(
program_name=>'lightweight_program',
argument_position=>2,
argument_type=>'VARCHAR2');

dbms_scheduler.DEFINE_PROGRAM_ARGUMENT(
program_name=>'lightweight_program',
argument_position=>3,
argument_type=>'NUMBER');

dbms_scheduler.DEFINE_PROGRAM_ARGUMENT(
program_name=>'lightweight_program',
argument_position=>4,
argument_type=>'NUMBER');

dbms_scheduler.DEFINE_PROGRAM_ARGUMENT(
program_name=>'lightweight_program',
argument_position=>5,
argument_type=>'VARCHAR');

dbms_scheduler.DEFINE_PROGRAM_ARGUMENT(
program_name=>'lightweight_program',
argument_position=>6,
argument_type=>'NUMBER');

dbms_scheduler.enable('lightweight_program');  
end;

Into the PL/SQL code, let’s create the lightweight job without forget to set the argument value before running the job:

DECLARE
...
BEGIN
....
LOOP
DBMS_SCHEDULER.create_job (
job_name        => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
program_name    => 'LIGHTWEIGHT_PROGRAM',
job_style       => 'LIGHTWEIGHT',
enabled         => FALSE);
                  
 DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
   argument_position       => 1,
   argument_value          => rec.scope);
   
DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
   argument_position       => 2,
   argument_value          => l_amq_name);
   
DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
   argument_position       => 3,
   argument_value          => rec.priority);

DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
   argument_position       => 4,
   argument_value          => l_date);   

DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
   argument_position       => 5,
   argument_value          => v_SESSION_ID);  

DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
   argument_position       => 6,
   argument_value          => i); 

dbms_scheduler.run_job('AMQ_P'||rec.priority||'j'||i||'_'||l_date,TRUE);
...
END LOOP;
Number of Rows Read : 148 (Means 148 Sql Statement to execute) 
START : 18:08:56
END : 18:27:40
Total : 18.84 mins

Let’s check we have always 148 jobs in parallel :

SQL> select count(*) from all_scheduler_job_run_details where job_name like '%20200524175036';

  COUNT(*)
----------
       148
SQL> select log_date,job_name,status,req_start_date from all_scheduler_job_run_details where job_name like '%20200524175036';

LOG_DATE           JOB_NAME     STATUS	        REQ_START_DATE
24-MAY-20 05.50.51 AB1C		SUCCEEDED	24-MAY-20 05.50.36
24-MAY-20 05.50.56 AB1D		SUCCEEDED	24-MAY-20 05.50.51
24-MAY-20 05.51.14 AB1E		SUCCEEDED	24-MAY-20 05.50.56
24-MAY-20 05.51.49 AB1I		SUCCEEDED	24-MAY-20 05.51.14
24-MAY-20 05.52.14 AB1P		SUCCEEDED	24-MAY-20 05.51.49
24-MAY-20 05.52.34 AB1L		SUCCEEDED	24-MAY-20 05.52.14
24-MAY-20 05.52.55 AB1N		SUCCEEDED	24-MAY-20 05.52.34
24-MAY-20 05.53.17 AB1M		SUCCEEDED	24-MAY-20 05.52.55
24-MAY-20 05.53.29 AB1K		SUCCEEDED	24-MAY-20 05.53.17
24-MAY-20 05.53.39 AB1O		SUCCEEDED	24-MAY-20 05.53.29
24-MAY-20 05.53.57 AB1U		SUCCEEDED	24-MAY-20 05.53.39
24-MAY-20 05.54.07 AB1V		SUCCEEDED	24-MAY-20 05.53.57

To resume :

We have 148 jobs all started, most of the time in parallel.
My PL/SQL process now took 18.54 mins (Lightweight Jobs) instead of 27.57mins (Regular Jobs).
If we compare Regular Jobs VS Lightweight Jobs, the former seems to schedule the jobs randomly (start jobs with block of 4,5,6…8) while the last one schedule jobs by block of 3 or 4 (as we can see above).

Conclusion :

DBMS_SCHEDULER (Regular Jobs or Lightweight Jobs) can improve significantly your PL/SQL performance transforming transforming your serial process in parallel process.
If you have small jobs (short lived-jobs), use lightweight jobs instead regular jobs.
Don’t underestimate the development time (development, test, bug solving) to transform your serial process to parallel process. Create 1 job is different to create more than 100 or 1000 jobs through a PL/SQL loop (concurrency problem, CPU used by create/drop the jobs).
As developer, you are responsible to manage your jobs (create,drop,purge) in order to not fill the oracle parameter job_queue_processes (used by a lot of critical oracle processes).

Post Views: 3,163

How to use DBMS_SCHEDULER to improve performance ?

Lazhar Felahi

Leave a Reply:

Related blog articles

Lazhar Felahi

Leave a Reply:

Related blog articles

Another file system for Linux: bcachefs (1) – basics

PostgreSQL 17: pg_buffercache_evict()

ODA X10-L storage configuration is different from what you may expect

PostgreSQL 17: Split and Merge partitions