Infrastructure at your Service

From an application point of view, the oracle scheduler DBMS_SCHEDULER allows to reach best performance by parallelizing your process.

Let’s start with the following PL/SQL code inserting in serial several rows from a metadata table to a target table. In my example, the metadata table does not contain “directly” the data but a set a of sql statement to be executed and for which the rows returned must be inserted into the target table My_Target_Table_Serial :

Let’s verify the contents of the source table called My_Metadata_Table:

SQL> SELECT priority,dwh_id, amq_name, sql_statement,scope from dwh_amq_v2;
ROWNUM  DWH_ID  AMQ_NAME SQL_STATEMENT          SCOPE
1	7	AAA1	 SELECT SUM(P.age pt.p	TYPE1
2	28	BBB2  	 SELECT CASE WHEN pt.p	TYPE1
3	37	CCC3	 "select cm.case_id fr"	TYPE2
4	48	DDD4	 "select cm.case_id fr"	TYPE2
5	73	EEE5	 SELECT DISTINCT pt.p	TYPE1
6	90	FFF6 	 SELECT LAG(ORW pt.p	TYPE1
7	114	GGG7	 SELECT distinct pt.	TYPE1
8	125	HHH8	 SELECT DISTINCT pt.p	TYPE1
...
148    115     ZZZ48    SELECT ROUND(TO_NUMBER TYPE2

Now let’s check the PL/SQL program :

DECLARE
  l_errm VARCHAR2(200);
  l_sql  VARCHAR2(32767) := NULL;
  sql_statement_1  VARCHAR2(32767) := NULL;
  sql_statement_2  VARCHAR2(32767) := NULL;
  l_amq_name VARCHAR2(200);
  l_date NUMBER;
BEGIN
  SELECT TO_NUMBER(TO_CHAR(SYSDATE,'YYYYMMDDHH24MISS')) INTO l_date FROM dual;
  FOR rec IN (SELECT dwh_id, amq_name, sql_statement,scope 
                FROM My_Metadata_Table,
                     (SELECT dwh_pit_date FROM dwh_code_mv) pt
               WHERE dwh_status = 1
                 AND (pt.dwh_pit_date >= dwh_valid_from AND pt.dwh_pit_date < dwh_valid_to) 
               ORDER BY priority, dwh_id) LOOP
    ...
    sql_statement_1 := substr(rec.sql_statement, 1, 32000);
    sql_statement_2 := substr(rec.sql_statement, 32001);
    IF rec.SCOPE = 'TYPE1' THEN 
      -- TYPE1 LEVEL SELECT
      l_sql := 'INSERT /*+ APPEND */ INTO My_Target_Table_Serial (dwh_pit_date, AMQ_ID, AMQ_TEXT, CASE_ID, ENTERPRISE_ID)'||CHR(13)|| 'SELECT DISTINCT TO_DATE(code.dwh_pit_date, ''YYYYMMDDHH24MISS''),'||rec.dwh_id|| ',''' ||rec.amq_name ||''', case_id, 1'||CHR(13)
      || ' FROM (SELECT dwh_pit_date FROM dwh_code) code, ('||sql_statement_1;
      EXECUTE IMMEDIATE l_sql || sql_statement_2 || ')';
      COMMIT;    
    ELSE 
      -- TYPE2 LEVEL SELECT
      l_sql :=  'INSERT /*+ APPEND */ INTO My_Target_Table_Serial (dwh_pit_date, AMQ_ID, AMQ_TEXT, CASE_ID, ENTERPRISE_ID)
      SELECT DISTINCT TO_DATE(code.dwh_pit_date, ''YYYYMMDDHH24MISS''), '||rec.dwh_id|| ',''' ||rec.amq_name || ''', cm.case_id, cm.enterprise_id'||CHR(13)
      || '  FROM (SELECT dwh_pit_date FROM dwh_code) code, v_sc_case_master cm, v_sc_case_event ce, ('||sql_statement_1;
              
      EXECUTE IMMEDIATE l_sql || sql_statement_2 || ') pt'||CHR(13)
      || ' WHERE cm.case_id = ce.case_id'||CHR(13) 
      || '   AND cm.deleted IS NULL AND cm.state_id <> 1'||CHR(13)
      || '   AND ce.deleted IS NULL AND ce.pref_term = pt.pt_name';
      COMMIT;         
    END IF;
    ...
   END LOOP:
END;
Number of Rows Read : 148 (Means 148 Sql Statement to execute)
START : 16:17:46
END : 16:57:42
Total :  40 mins

 

As we can see, each Sql Statement is executed in serial, let’s check the audit table recording the loading time (Insert Time) and the “scheduling”   :

CREATE_DATE		NAME	START_DATE		END_DATE            LOADING_TIME
22.05.2020 16:46:34	AAA1	22.05.2020 16:46:34	22.05.2020 16:57:42    11.08mins
22.05.2020 16:42:05	BBB2	22.05.2020 16:42:05	22.05.2020 16:46:34    04.29mins
22.05.2020 16:41:15	CCC3	22.05.2020 16:41:15	22.05.2020 16:42:05    50sec
22.05.2020 16:40:42	DDD4	22.05.2020 16:40:42	22.05.2020 16:41:15    32sec
22.05.2020 16:40:20	EEE5	22.05.2020 16:40:20	22.05.2020 16:40:42    22sec
22.05.2020 16:37:23	FFF6	22.05.2020 16:37:23	22.05.2020 16:40:20    02.57mins
22.05.2020 16:37:12	GGG7	22.05.2020 16:37:12	22.05.2020 16:37:23    11sec
...
22.05.2020 16:36:03	ZZZ148	22.05.2020 16:17:35	22.05.2020 16:17:46    11sec

To resume :

  • The 148 rows (148 Sql Statement) coming from the source table are loaded in serial in 40mins.
  • The majority of rows have taken less than 01 min to load (Ex. : Name = CCC3,DDD4,EEE5,GGG7 and ZZZ148)
  • Few rows have taken more than a couple of minutes to load.
  • The maximum loading time is 11.08mins for the Name “AA1”.
  • Each row must wait the previous row complete his loading before to start his loading (compare END_DATE previous vs START_DATE current).

To optimize the process, let’s trying to load all the rows coming from the source table in parallel by using the oracle scheduler DBMS_SCHEDULER.

Instead to execute directly the Insert command in the loop, let’s create a job through DBMS_SCHEDULER:

FOR rec IN (SELECT priority,dwh_id, amq_name, sql_statement,scope 
                FROM My_Metadata_Table,
                     (SELECT dwh_pit_date FROM dwh_code_mv) pt
               WHERE dwh_status = 1
                 AND (pt.dwh_pit_date >= dwh_valid_from AND pt.dwh_pit_date < dwh_valid_to) 
               ORDER BY priority, dwh_id) LOOP

     l_amq_name := rec.amq_name;
       IF rec.SCOPE = 'TYPE1' THEN 
        -- TYPE1 LEVEL SELECT
         ...
  
            --Execute Job to insert the AMQ : Background process
            DBMS_SCHEDULER.CREATE_JOB (
            job_name             => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
            job_type             => 'PLSQL_BLOCK',
            job_action           => 'BEGIN
                                      LOAD_DATA(''CASE'','||''''||l_amq_name||''''||','||rec.priority||','||l_date||','||v_SESSION_ID||','||i||');
                                     END;',
            start_date    =>  sysdate,  
            enabled       =>  TRUE,  
            auto_drop     =>  TRUE,  
            comments      =>  'job for amq '||l_amq_name);
          END IF;
        ELSE 
            ...
            END IF;
        END IF; 
      i := i +1;
  END LOOP;
Number of Rows Read : 148 (Means 148 Sql Statement to execute)
START : 08:14:03
END : 08:42:32
Total :  27.57 mins

To resume :

  • The 148 rows (148 Sql Statement) coming from the source table are loaded now in parallel in 27.57mins instead of 40mins in serial.
  • The options of DBMS_SCHEDULER are  :
    • As we are limited in number of character for the parameter “job_action”, we have to insert the data through a PL/SQL procedure LOAD_DATA.
    • The job is executed immediately (start_date=sysdate) and purged immediately after his execution (auto_drop=TRUE).

Let’s check now how the jobs are scheduled. Since we do a loop of 148 times, I expect to have 148 jobs:

First, let’s check now if the rows (Remember, One Row = One Insert Into Target Table From Source Table) are loaded in parallel :

CREATE_DATE 	    NAME START_DATE 	        END_DATE 				       
22.05.2020 16:46:34 AAA1 23.05.2020 08:14:04	23.05.2020 08:21:19
22.05.2020 16:42:05 BBB2 23.05.2020 08:14:04	23.05.2020 08:20:43
22.05.2020 16:41:15 CCC3 23.05.2020 08:14:04	23.05.2020 08:21:59
22.05.2020 16:40:42 DDD4 23.05.2020 08:14:03	23.05.2020 08:15:29
22.05.2020 16:40:20 EEE5 23.05.2020 08:14:03	23.05.2020 08:15:05
22.05.2020 16:37:23 FFF6 23.05.2020 08:14:03	23.05.2020 08:14:47
22.05.2020 16:37:12 GGG7 23.05.2020 08:14:03	23.05.2020 08:15:59
...                     
22.05.2020 16:36:03 ZZZ148 22.05.2020 16:17:35 22.05.2020 16:17:46

This is the case, all rows have the same start_date, meaning all rows start in parallel. Let’s verify into “all_scheduler_job_run_details” to check we have our 148 jobs in parallel :

SQL> select count(*) from all_scheduler_job_run_details where job_name like '%20200523081403';

  COUNT(*)
----------
       148
SQL> select log_date,job_name,status,req_start_date from all_scheduler_job_run_details where job_name like '%20200523081403';
LOG_DATE		JOB_NAME		        STATUS		REQ_START_DATE
23-MAY-20 08.42.41	AMQ_P3J147_20200523081403	SUCCEEDED	23-MAY-20 02.42.32
23-MAY-20 08.42.32	AMQ_P2J146_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
23-MAY-20 08.37.56	AMQ_P2J145_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
23-MAY-20 08.37.33	AMQ_P2J144_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
23-MAY-20 08.37.22	AMQ_P2J143_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
23-MAY-20 08.37.03	AMQ_P2J141_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
23-MAY-20 08.36.50	AMQ_P2J142_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
23-MAY-20 08.33.57	AMQ_P2J140_20200523081403	SUCCEEDED	23-MAY-20 02.23.13
--Only the first 8 rows are displayed

To resume :

  • We have 148 jobs all started, most of the time in parallel (job with same REQ_START_DATE, oracle parallelizes jobs per block randomly).
  • My PL/SQL process now took 27.57 mins instead of 40mins.

But if we have a look in details, we have a lot of small jobs. Those are jobs where run_duration is less than 01 mins:

SQL> select run_duration from all_scheduler_job_run_details where job_name like '%20200523081403' order by run_duration;

RUN_DURATION
+00 00:00:04.000000
+00 00:00:07.000000
+00 00:00:09.000000
+00 00:00:10.000000
+00 00:00:13.000000
+00 00:00:15.000000
+00 00:00:20.000000
+00 00:00:27.000000
+00 00:00:33.000000
+00 00:00:35.000000
+00 00:00:36.000000
+00 00:00:38.000000
+00 00:00:43.000000
+00 00:00:46.000000
+00 00:00:51.000000
+00 00:00:52.000000

As we have a lot of small jobs (short-lived jobs), it will be more interesting to use lightweight jobs instead of regular jobs.

In contrary of regular jobs, lightweight jobs :

  • Require less meta data, so they have quicker create and drop times.
  • Suited for short-lived jobs (small jobs, jobs where run_duration is low).

Let’s rewrite our PL/SQL process using lightweight jobs :

To use lightweight jobs, first create a program suitable for a lightweight job :

begin
dbms_scheduler.create_program
(
    program_name=>'LIGHTWEIGHT_PROGRAM',
    program_action=>'LOAD_AMQ',
    program_type=>'STORED_PROCEDURE',
    number_of_arguments=>6, 
    enabled=>FALSE);
END;

Add the arguments (parameters) and enable the program :

BEGIN
dbms_scheduler.DEFINE_PROGRAM_ARGUMENT(
program_name=>'lightweight_program',
argument_position=>1,
argument_type=>'VARCHAR2',
DEFAULT_VALUE=>NULL);

dbms_scheduler.DEFINE_PROGRAM_ARGUMENT(
program_name=>'lightweight_program',
argument_position=>2,
argument_type=>'VARCHAR2');

dbms_scheduler.DEFINE_PROGRAM_ARGUMENT(
program_name=>'lightweight_program',
argument_position=>3,
argument_type=>'NUMBER');

dbms_scheduler.DEFINE_PROGRAM_ARGUMENT(
program_name=>'lightweight_program',
argument_position=>4,
argument_type=>'NUMBER');

dbms_scheduler.DEFINE_PROGRAM_ARGUMENT(
program_name=>'lightweight_program',
argument_position=>5,
argument_type=>'VARCHAR');

dbms_scheduler.DEFINE_PROGRAM_ARGUMENT(
program_name=>'lightweight_program',
argument_position=>6,
argument_type=>'NUMBER');

dbms_scheduler.enable('lightweight_program');  
end;

Into the PL/SQL code, let’s create the lightweight job without forget to set the argument value before running the job:

DECLARE
...
BEGIN
....
LOOP
DBMS_SCHEDULER.create_job (
job_name        => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
program_name    => 'LIGHTWEIGHT_PROGRAM',
job_style       => 'LIGHTWEIGHT',
enabled         => FALSE);
                  
 DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
   argument_position       => 1,
   argument_value          => rec.scope);
   
DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
   argument_position       => 2,
   argument_value          => l_amq_name);
   
DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
   argument_position       => 3,
   argument_value          => rec.priority);

DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
   argument_position       => 4,
   argument_value          => l_date);   

DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
   argument_position       => 5,
   argument_value          => v_SESSION_ID);  

DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE (
   job_name                => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date,
   argument_position       => 6,
   argument_value          => i); 

dbms_scheduler.run_job('AMQ_P'||rec.priority||'j'||i||'_'||l_date,TRUE);
...
END LOOP;
Number of Rows Read : 148 (Means 148 Sql Statement to execute) 
START : 18:08:56
END : 18:27:40
Total : 18.84 mins

 

Let’s check we have always 148 jobs in parallel :

SQL> select count(*) from all_scheduler_job_run_details where job_name like '%20200524175036';

  COUNT(*)
----------
       148
SQL> select log_date,job_name,status,req_start_date from all_scheduler_job_run_details where job_name like '%20200524175036';

LOG_DATE           JOB_NAME     STATUS	        REQ_START_DATE
24-MAY-20 05.50.51 AB1C		SUCCEEDED	24-MAY-20 05.50.36
24-MAY-20 05.50.56 AB1D		SUCCEEDED	24-MAY-20 05.50.51
24-MAY-20 05.51.14 AB1E		SUCCEEDED	24-MAY-20 05.50.56
24-MAY-20 05.51.49 AB1I		SUCCEEDED	24-MAY-20 05.51.14
24-MAY-20 05.52.14 AB1P		SUCCEEDED	24-MAY-20 05.51.49
24-MAY-20 05.52.34 AB1L		SUCCEEDED	24-MAY-20 05.52.14
24-MAY-20 05.52.55 AB1N		SUCCEEDED	24-MAY-20 05.52.34
24-MAY-20 05.53.17 AB1M		SUCCEEDED	24-MAY-20 05.52.55
24-MAY-20 05.53.29 AB1K		SUCCEEDED	24-MAY-20 05.53.17
24-MAY-20 05.53.39 AB1O		SUCCEEDED	24-MAY-20 05.53.29
24-MAY-20 05.53.57 AB1U		SUCCEEDED	24-MAY-20 05.53.39
24-MAY-20 05.54.07 AB1V		SUCCEEDED	24-MAY-20 05.53.57

To resume :

  • We have 148 jobs all started, most of the time in parallel.
  • My PL/SQL process now took 18.54 mins (Lightweight Jobs) instead of 27.57mins (Regular Jobs).
  • If we compare Regular Jobs VS Lightweight Jobs, the former seems to schedule the jobs randomly (start jobs with block of 4,5,6…8) while the last one schedule jobs by block of 3 or 4 (as we can see above).

Conclusion :

  • DBMS_SCHEDULER (Regular Jobs or Lightweight Jobs) can improve significantly your PL/SQL performance transforming transforming your serial process in parallel process.
  • If you have small jobs (short lived-jobs), use lightweight jobs instead regular jobs.
  • Don’t underestimate the development time (development, test, bug solving) to transform your serial process to parallel process. Create 1 job is different to create more than 100 or 1000 jobs through a PL/SQL loop (concurrency problem, CPU used by create/drop the jobs).
  • As developer, you are responsible to manage your jobs (create,drop,purge) in order to not fill the oracle parameter job_queue_processes (used by a lot of critical oracle processes).

3 Comments

  • Praveen BN says:

    Amazing info. Thanks for sharing. Kudos to your competence.

  • Lazhar Felahi says:

    Here is an update,
    In my above example, all INSERT are done with the hint “APPEND”. Since the Insert is done on the same table in parallel, I will face Table Lock (High Enq: TM – Contention Wait Events When Using Insert APPEND (Doc ID 2247733.1)) increasing the loading time. To solve this issue, I changed the code to do the Insert /*+ APPEND */ only on the big Insert reducing the contention and increasing the loading for big Insert.
    Regards
    Lazhar

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Lazhar Felahi
Lazhar Felahi

Consultant