From an application point of view, the oracle scheduler DBMS_SCHEDULER allows to reach best performance by parallelizing your process.
Let’s start with the following PL/SQL code inserting in serial several rows from a metadata table to a target table. In my example, the metadata table does not contain “directly” the data but a set a of sql statement to be executed and for which the rows returned must be inserted into the target table My_Target_Table_Serial :
Let’s verify the contents of the source table called My_Metadata_Table:
SQL> SELECT priority,dwh_id, amq_name, sql_statement,scope from dwh_amq_v2; ROWNUM DWH_ID AMQ_NAME SQL_STATEMENT SCOPE 1 7 AAA1 SELECT SUM(P.age pt.p TYPE1 2 28 BBB2 SELECT CASE WHEN pt.p TYPE1 3 37 CCC3 "select cm.case_id fr" TYPE2 4 48 DDD4 "select cm.case_id fr" TYPE2 5 73 EEE5 SELECT DISTINCT pt.p TYPE1 6 90 FFF6 SELECT LAG(ORW pt.p TYPE1 7 114 GGG7 SELECT distinct pt. TYPE1 8 125 HHH8 SELECT DISTINCT pt.p TYPE1 ... 148 115 ZZZ48 SELECT ROUND(TO_NUMBER TYPE2
Now let’s check the PL/SQL program :
DECLARE l_errm VARCHAR2(200); l_sql VARCHAR2(32767) := NULL; sql_statement_1 VARCHAR2(32767) := NULL; sql_statement_2 VARCHAR2(32767) := NULL; l_amq_name VARCHAR2(200); l_date NUMBER; BEGIN SELECT TO_NUMBER(TO_CHAR(SYSDATE,'YYYYMMDDHH24MISS')) INTO l_date FROM dual; FOR rec IN (SELECT dwh_id, amq_name, sql_statement,scope FROM My_Metadata_Table, (SELECT dwh_pit_date FROM dwh_code_mv) pt WHERE dwh_status = 1 AND (pt.dwh_pit_date >= dwh_valid_from AND pt.dwh_pit_date < dwh_valid_to) ORDER BY priority, dwh_id) LOOP ... sql_statement_1 := substr(rec.sql_statement, 1, 32000); sql_statement_2 := substr(rec.sql_statement, 32001); IF rec.SCOPE = 'TYPE1' THEN -- TYPE1 LEVEL SELECT l_sql := 'INSERT /*+ APPEND */ INTO My_Target_Table_Serial (dwh_pit_date, AMQ_ID, AMQ_TEXT, CASE_ID, ENTERPRISE_ID)'||CHR(13)|| 'SELECT DISTINCT TO_DATE(code.dwh_pit_date, ''YYYYMMDDHH24MISS''),'||rec.dwh_id|| ',''' ||rec.amq_name ||''', case_id, 1'||CHR(13) || ' FROM (SELECT dwh_pit_date FROM dwh_code) code, ('||sql_statement_1; EXECUTE IMMEDIATE l_sql || sql_statement_2 || ')'; COMMIT; ELSE -- TYPE2 LEVEL SELECT l_sql := 'INSERT /*+ APPEND */ INTO My_Target_Table_Serial (dwh_pit_date, AMQ_ID, AMQ_TEXT, CASE_ID, ENTERPRISE_ID) SELECT DISTINCT TO_DATE(code.dwh_pit_date, ''YYYYMMDDHH24MISS''), '||rec.dwh_id|| ',''' ||rec.amq_name || ''', cm.case_id, cm.enterprise_id'||CHR(13) || ' FROM (SELECT dwh_pit_date FROM dwh_code) code, v_sc_case_master cm, v_sc_case_event ce, ('||sql_statement_1; EXECUTE IMMEDIATE l_sql || sql_statement_2 || ') pt'||CHR(13) || ' WHERE cm.case_id = ce.case_id'||CHR(13) || ' AND cm.deleted IS NULL AND cm.state_id <> 1'||CHR(13) || ' AND ce.deleted IS NULL AND ce.pref_term = pt.pt_name'; COMMIT; END IF; ... END LOOP: END; Number of Rows Read : 148 (Means 148 Sql Statement to execute) START : 16:17:46 END : 16:57:42 Total : 40 mins
As we can see, each Sql Statement is executed in serial, let’s check the audit table recording the loading time (Insert Time) and the “scheduling” :
CREATE_DATE NAME START_DATE END_DATE LOADING_TIME 22.05.2020 16:46:34 AAA1 22.05.2020 16:46:34 22.05.2020 16:57:42 11.08mins 22.05.2020 16:42:05 BBB2 22.05.2020 16:42:05 22.05.2020 16:46:34 04.29mins 22.05.2020 16:41:15 CCC3 22.05.2020 16:41:15 22.05.2020 16:42:05 50sec 22.05.2020 16:40:42 DDD4 22.05.2020 16:40:42 22.05.2020 16:41:15 32sec 22.05.2020 16:40:20 EEE5 22.05.2020 16:40:20 22.05.2020 16:40:42 22sec 22.05.2020 16:37:23 FFF6 22.05.2020 16:37:23 22.05.2020 16:40:20 02.57mins 22.05.2020 16:37:12 GGG7 22.05.2020 16:37:12 22.05.2020 16:37:23 11sec ... 22.05.2020 16:36:03 ZZZ148 22.05.2020 16:17:35 22.05.2020 16:17:46 11sec
To resume :
- The 148 rows (148 Sql Statement) coming from the source table are loaded in serial in 40mins.
- The majority of rows have taken less than 01 min to load (Ex. : Name = CCC3,DDD4,EEE5,GGG7 and ZZZ148)
- Few rows have taken more than a couple of minutes to load.
- The maximum loading time is 11.08mins for the Name “AA1”.
- Each row must wait the previous row complete his loading before to start his loading (compare END_DATE previous vs START_DATE current).
To optimize the process, let’s trying to load all the rows coming from the source table in parallel by using the oracle scheduler DBMS_SCHEDULER.
Instead to execute directly the Insert command in the loop, let’s create a job through DBMS_SCHEDULER:
FOR rec IN (SELECT priority,dwh_id, amq_name, sql_statement,scope FROM My_Metadata_Table, (SELECT dwh_pit_date FROM dwh_code_mv) pt WHERE dwh_status = 1 AND (pt.dwh_pit_date >= dwh_valid_from AND pt.dwh_pit_date < dwh_valid_to) ORDER BY priority, dwh_id) LOOP l_amq_name := rec.amq_name; IF rec.SCOPE = 'TYPE1' THEN -- TYPE1 LEVEL SELECT ... --Execute Job to insert the AMQ : Background process DBMS_SCHEDULER.CREATE_JOB ( job_name => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date, job_type => 'PLSQL_BLOCK', job_action => 'BEGIN LOAD_DATA(''CASE'','||''''||l_amq_name||''''||','||rec.priority||','||l_date||','||v_SESSION_ID||','||i||'); END;', start_date => sysdate, enabled => TRUE, auto_drop => TRUE, comments => 'job for amq '||l_amq_name); END IF; ELSE ... END IF; END IF; i := i +1; END LOOP; Number of Rows Read : 148 (Means 148 Sql Statement to execute) START : 08:14:03 END : 08:42:32 Total : 27.57 mins
To resume :
- The 148 rows (148 Sql Statement) coming from the source table are loaded now in parallel in 27.57mins instead of 40mins in serial.
- The options of DBMS_SCHEDULER are :
- As we are limited in number of character for the parameter “job_action”, we have to insert the data through a PL/SQL procedure LOAD_DATA.
- The job is executed immediately (start_date=sysdate) and purged immediately after his execution (auto_drop=TRUE).
Let’s check now how the jobs are scheduled. Since we do a loop of 148 times, I expect to have 148 jobs:
First, let’s check now if the rows (Remember, One Row = One Insert Into Target Table From Source Table) are loaded in parallel :
CREATE_DATE NAME START_DATE END_DATE 22.05.2020 16:46:34 AAA1 23.05.2020 08:14:04 23.05.2020 08:21:19 22.05.2020 16:42:05 BBB2 23.05.2020 08:14:04 23.05.2020 08:20:43 22.05.2020 16:41:15 CCC3 23.05.2020 08:14:04 23.05.2020 08:21:59 22.05.2020 16:40:42 DDD4 23.05.2020 08:14:03 23.05.2020 08:15:29 22.05.2020 16:40:20 EEE5 23.05.2020 08:14:03 23.05.2020 08:15:05 22.05.2020 16:37:23 FFF6 23.05.2020 08:14:03 23.05.2020 08:14:47 22.05.2020 16:37:12 GGG7 23.05.2020 08:14:03 23.05.2020 08:15:59 ... 22.05.2020 16:36:03 ZZZ148 22.05.2020 16:17:35 22.05.2020 16:17:46
This is the case, all rows have the same start_date, meaning all rows start in parallel. Let’s verify into “all_scheduler_job_run_details” to check we have our 148 jobs in parallel :
SQL> select count(*) from all_scheduler_job_run_details where job_name like '%20200523081403'; COUNT(*) ---------- 148 SQL> select log_date,job_name,status,req_start_date from all_scheduler_job_run_details where job_name like '%20200523081403'; LOG_DATE JOB_NAME STATUS REQ_START_DATE 23-MAY-20 08.42.41 AMQ_P3J147_20200523081403 SUCCEEDED 23-MAY-20 02.42.32 23-MAY-20 08.42.32 AMQ_P2J146_20200523081403 SUCCEEDED 23-MAY-20 02.23.13 23-MAY-20 08.37.56 AMQ_P2J145_20200523081403 SUCCEEDED 23-MAY-20 02.23.13 23-MAY-20 08.37.33 AMQ_P2J144_20200523081403 SUCCEEDED 23-MAY-20 02.23.13 23-MAY-20 08.37.22 AMQ_P2J143_20200523081403 SUCCEEDED 23-MAY-20 02.23.13 23-MAY-20 08.37.03 AMQ_P2J141_20200523081403 SUCCEEDED 23-MAY-20 02.23.13 23-MAY-20 08.36.50 AMQ_P2J142_20200523081403 SUCCEEDED 23-MAY-20 02.23.13 23-MAY-20 08.33.57 AMQ_P2J140_20200523081403 SUCCEEDED 23-MAY-20 02.23.13 --Only the first 8 rows are displayed
To resume :
- We have 148 jobs all started, most of the time in parallel (job with same REQ_START_DATE, oracle parallelizes jobs per block randomly).
- My PL/SQL process now took 27.57 mins instead of 40mins.
But if we have a look in details, we have a lot of small jobs. Those are jobs where run_duration is less than 01 mins:
SQL> select run_duration from all_scheduler_job_run_details where job_name like '%20200523081403' order by run_duration; RUN_DURATION +00 00:00:04.000000 +00 00:00:07.000000 +00 00:00:09.000000 +00 00:00:10.000000 +00 00:00:13.000000 +00 00:00:15.000000 +00 00:00:20.000000 +00 00:00:27.000000 +00 00:00:33.000000 +00 00:00:35.000000 +00 00:00:36.000000 +00 00:00:38.000000 +00 00:00:43.000000 +00 00:00:46.000000 +00 00:00:51.000000 +00 00:00:52.000000
As we have a lot of small jobs (short-lived jobs), it will be more interesting to use lightweight jobs instead of regular jobs.
In contrary of regular jobs, lightweight jobs :
- Require less meta data, so they have quicker create and drop times.
- Suited for short-lived jobs (small jobs, jobs where run_duration is low).
Let’s rewrite our PL/SQL process using lightweight jobs :
To use lightweight jobs, first create a program suitable for a lightweight job :
begin dbms_scheduler.create_program ( program_name=>'LIGHTWEIGHT_PROGRAM', program_action=>'LOAD_AMQ', program_type=>'STORED_PROCEDURE', number_of_arguments=>6, enabled=>FALSE); END;
Add the arguments (parameters) and enable the program :
BEGIN dbms_scheduler.DEFINE_PROGRAM_ARGUMENT( program_name=>'lightweight_program', argument_position=>1, argument_type=>'VARCHAR2', DEFAULT_VALUE=>NULL); dbms_scheduler.DEFINE_PROGRAM_ARGUMENT( program_name=>'lightweight_program', argument_position=>2, argument_type=>'VARCHAR2'); dbms_scheduler.DEFINE_PROGRAM_ARGUMENT( program_name=>'lightweight_program', argument_position=>3, argument_type=>'NUMBER'); dbms_scheduler.DEFINE_PROGRAM_ARGUMENT( program_name=>'lightweight_program', argument_position=>4, argument_type=>'NUMBER'); dbms_scheduler.DEFINE_PROGRAM_ARGUMENT( program_name=>'lightweight_program', argument_position=>5, argument_type=>'VARCHAR'); dbms_scheduler.DEFINE_PROGRAM_ARGUMENT( program_name=>'lightweight_program', argument_position=>6, argument_type=>'NUMBER'); dbms_scheduler.enable('lightweight_program'); end;
Into the PL/SQL code, let’s create the lightweight job without forget to set the argument value before running the job:
DECLARE ... BEGIN .... LOOP DBMS_SCHEDULER.create_job ( job_name => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date, program_name => 'LIGHTWEIGHT_PROGRAM', job_style => 'LIGHTWEIGHT', enabled => FALSE); DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE ( job_name => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date, argument_position => 1, argument_value => rec.scope); DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE ( job_name => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date, argument_position => 2, argument_value => l_amq_name); DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE ( job_name => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date, argument_position => 3, argument_value => rec.priority); DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE ( job_name => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date, argument_position => 4, argument_value => l_date); DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE ( job_name => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date, argument_position => 5, argument_value => v_SESSION_ID); DBMS_SCHEDULER.SET_JOB_ARGUMENT_VALUE ( job_name => 'AMQ_P'||rec.priority||'j'||i||'_'||l_date, argument_position => 6, argument_value => i); dbms_scheduler.run_job('AMQ_P'||rec.priority||'j'||i||'_'||l_date,TRUE); ... END LOOP; Number of Rows Read : 148 (Means 148 Sql Statement to execute) START : 18:08:56 END : 18:27:40 Total : 18.84 mins
Let’s check we have always 148 jobs in parallel :
SQL> select count(*) from all_scheduler_job_run_details where job_name like '%20200524175036'; COUNT(*) ---------- 148 SQL> select log_date,job_name,status,req_start_date from all_scheduler_job_run_details where job_name like '%20200524175036'; LOG_DATE JOB_NAME STATUS REQ_START_DATE 24-MAY-20 05.50.51 AB1C SUCCEEDED 24-MAY-20 05.50.36 24-MAY-20 05.50.56 AB1D SUCCEEDED 24-MAY-20 05.50.51 24-MAY-20 05.51.14 AB1E SUCCEEDED 24-MAY-20 05.50.56 24-MAY-20 05.51.49 AB1I SUCCEEDED 24-MAY-20 05.51.14 24-MAY-20 05.52.14 AB1P SUCCEEDED 24-MAY-20 05.51.49 24-MAY-20 05.52.34 AB1L SUCCEEDED 24-MAY-20 05.52.14 24-MAY-20 05.52.55 AB1N SUCCEEDED 24-MAY-20 05.52.34 24-MAY-20 05.53.17 AB1M SUCCEEDED 24-MAY-20 05.52.55 24-MAY-20 05.53.29 AB1K SUCCEEDED 24-MAY-20 05.53.17 24-MAY-20 05.53.39 AB1O SUCCEEDED 24-MAY-20 05.53.29 24-MAY-20 05.53.57 AB1U SUCCEEDED 24-MAY-20 05.53.39 24-MAY-20 05.54.07 AB1V SUCCEEDED 24-MAY-20 05.53.57
To resume :
- We have 148 jobs all started, most of the time in parallel.
- My PL/SQL process now took 18.54 mins (Lightweight Jobs) instead of 27.57mins (Regular Jobs).
- If we compare Regular Jobs VS Lightweight Jobs, the former seems to schedule the jobs randomly (start jobs with block of 4,5,6…8) while the last one schedule jobs by block of 3 or 4 (as we can see above).
Conclusion :
- DBMS_SCHEDULER (Regular Jobs or Lightweight Jobs) can improve significantly your PL/SQL performance transforming transforming your serial process in parallel process.
- If you have small jobs (short lived-jobs), use lightweight jobs instead regular jobs.
- Don’t underestimate the development time (development, test, bug solving) to transform your serial process to parallel process. Create 1 job is different to create more than 100 or 1000 jobs through a PL/SQL loop (concurrency problem, CPU used by create/drop the jobs).
- As developer, you are responsible to manage your jobs (create,drop,purge) in order to not fill the oracle parameter job_queue_processes (used by a lot of critical oracle processes).