By Franck Pachot

.
Recently, I described the Partial Join Evaluation transformation that appeared last year in Oracle 12c. I did it as an introduction for another transformation that appeared long time ago in 10.1.0.3 but was not used by default. And even in the latest Oracle 12c patchset 1 (aka 12.1.0.2.0) it is still not enabled. But it’s there and you can use it if you set optimizer_features_enabled to 12.1.0.2.1 (that’s not a typo!).

Yes, that number looks like the future PSU for the 12c Release 1 Patchset 1 that was available recently and has no PSU yet. Lost in the release numbers? No problem. This is only default values for the _convert_set_to_join paramter but you can also use the hint to get that transformation, which is available in previous versions as well.

So what does this transformation? It transforms an INTERSECT or MINUS into a join. When the tables are large but the result is small, that transformation can bring new access path avoiding full table scans and deduplication for each branch. And thanks to the Partial Join Evaluation the performance is even better in 12c. Let’s look at an example.

SQL*Plus: Release 12.1.0.2.0 Production on Sun Jul 27 22:10:57 2014
Copyright (c) 1982, 2014, Oracle.  All rights reserved.
Connected to:
 Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
 With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
SQL> create table DEMO1(n constraint DEMO1_N primary key) as select rownum n from (select * from dual connect by level  Table created.
SQL> create table DEMO2(n constraint DEMO2_N primary key) as select rownum n from dual connect by level  Table created.
SQL> alter session set statistics_level=all;
 Session altered.

So I have two tables, one with 100000 rows and one with only 10. And I want the rows from DEMO1 which are not in DEMO2:

SQL> alter session set optimizer_features_enable='12.1.0.2.1';
 Session altered.
SQL> select * from DEMO1 intersect select * from DEMO2;
         N
 ----------
          1
          2
          3
          4
          5
          6
          7
          8
          9
         10
10 rows selected.

Let’s have a look at the plan:

SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));
PLAN_TABLE_OUTPUT
 -----------------------------------------------------------------------------
 SQL_ID 9fpg8nyjaqb5f, child number 0
 -------------------------------------
 select * from DEMO1 intersect select * from DEMO2
Plan hash value: 4278239763
------------------------------------------------------------------------------
 | Id  | Operation           | Name    | Starts | E-Rows | A-Rows || Used-Mem |
 ------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT    |         |      1 |        |     10 ||          |
 |   1 |  INTERSECTION       |         |      1 |        |     10 ||          |
 |   2 |   SORT UNIQUE       |         |      1 |    100K|    100K|| 4078K (0)|
 |   3 |    TABLE ACCESS FULL| DEMO1   |      1 |    100K|    100K||          |
 |   4 |   SORT UNIQUE NOSORT|         |      1 |     10 |     10 ||          |
 |   5 |    INDEX FULL SCAN  | DEMO2_N |      1 |     10 |     10 ||          |
 ------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
 -------------------------------------------------------------
   1 - SET$1
    3 - SEL$1 / DEMO1@SEL$1
    5 - SEL$2 / DEMO2@SEL$2

This is the expected plan. There is an INTERSECTION operation that implements our INTERSECT. But look: each branch had to be deduplicated (SORT UNIQUE). Note that the SORT UNIQUE NOSORT has a funny name – it’s just a SORT UNIQUE that doesn’t have to sort because its input comes from an index. Each branch had to read all the rows. Look at the big table: we read 100000 rows and use 4MB of memory to sort them in order to deduplicate them. But it’s an intersection and we have a small table that has only 10 rows. We know that the result cannot be large. Then a more efficient way would be to read the small table and for each row check if they are in the big one – through an index access. We still have to deduplicate, but we do that at the end, on the small rowset.

And this is exactly what the Set to Join Conversion is doing. Let’s force it with a hint:

SQL> select /*+ SET_TO_JOIN(@"SET$1") */ * from DEMO1 intersect select * from DEMO2;
         N
 ----------
          1
          2
          3
          4
          5
          6
          7
          8
          9
         10
10 rows selected.
SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));
PLAN_TABLE_OUTPUT
 -----------------------------------------------------------------
 SQL_ID        01z69x8w7fmu0, child number 0
 -------------------------------------
 select /*+ SET_TO_JOIN(@"SET$1") */ * from DEMO1 intersect select *
 from DEMO2
Plan hash value: 169945296
------------------------------------------------------------------
 | Id  | Operation           | Name    | Starts | E-Rows | A-Rows |
 ------------------------------------------------------------------
 |   0 | SELECT STATEMENT    |         |      1 |        |     10 |
 |   1 |  SORT UNIQUE NOSORT |         |      1 |     10 |     10 |
 |   2 |   NESTED LOOPS SEMI |         |      1 |     10 |     10 |
 |   3 |    INDEX FULL SCAN  | DEMO2_N |      1 |     10 |     10 |
 |*  4 |    INDEX UNIQUE SCAN| DEMO1_N |     10 |    100K|     10 |
 ------------------------------------------------------------------
Predicate Information (identified by operation id):
 ---------------------------------------------------
   4 - access("DEMO1"."N"="DEMO2"."N")

 

The intersect has been transformed to a join thanks to the Set to Join transformation, and the join has been transformed to a semi-join thanks to the Partial Join Evaluation transformation. The result is clear here:

  • No full table scan on the big table because the join is able to access with an index
  • No deduplication which needs a large workarea
  • The join can stop as soon as one row matches thanks to the semi-join
  • Deduplication occurs only on result, which is small. And here it does not even require a workarea because the rows comes sorted from the index.

We can see the SET_TO_JOIN and PARTIAL_JOIN hints in the outline:

Outline Data
 -------------
  /*+
       BEGIN_OUTLINE_DATA
       IGNORE_OPTIM_EMBEDDED_HINTS
       OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
       DB_VERSION('12.1.0.2')
       ALL_ROWS
       OUTLINE_LEAF(@"SEL$02B15F54")
       MERGE(@"SEL$1")
       MERGE(@"SEL$2")
       OUTLINE(@"SET$09AAA538")
       SET_TO_JOIN(@"SET$1")
       OUTLINE(@"SEL$1")
       OUTLINE(@"SEL$2")
       OUTLINE(@"SET$1")
       INDEX(@"SEL$02B15F54" "DEMO2"@"SEL$2" ("DEMO2"."N"))
       INDEX(@"SEL$02B15F54" "DEMO1"@"SEL$1" ("DEMO1"."N"))
       LEADING(@"SEL$02B15F54" "DEMO2"@"SEL$2" "DEMO1"@"SEL$1")
       USE_NL(@"SEL$02B15F54" "DEMO1"@"SEL$1")
       PARTIAL_JOIN(@"SEL$02B15F54" "DEMO1"@"SEL$1")
       END_OUTLINE_DATA
   */

So we are in 12.1.0.2 and we need a hint for that. Let’s go to 12.1.0.2.1 (which implicitely set _convert_set_to_join=true).

PLAN_TABLE_OUTPUT
 -----------------------------------------------------------------------------
 SQL_ID        9fpg8nyjaqb5f, child number 1
 -------------------------------------
 select * from DEMO1 intersect select * from DEMO2
Plan hash value: 118900122
------------------------------------------------------------------------------
 | Id  | Operation           | Name    | Starts | E-Rows | A-Rows || Used-Mem |
 ------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT    |         |      1 |        |     10 ||          |
 |   1 |  HASH UNIQUE        |         |      1 |     10 |     10 || 1260K (0)|
 |   2 |   NESTED LOOPS SEMI |         |      1 |     10 |     10 ||          |
 |   3 |    INDEX FULL SCAN  | DEMO2_N |      1 |     10 |     10 ||          |
 |*  4 |    INDEX UNIQUE SCAN| DEMO1_N |     10 |    100K|     10 ||          |
 ------------------------------------------------------------------------------
Predicate Information (identified by operation id):
 ---------------------------------------------------
   4 - access("DEMO1"."N"="DEMO2"."N")
Note
 -----
    - this is an adaptive plan

Ok, we have the Set to Join Conversion here in 12.1.0.2.1

But don’t you see another difference?
.
.
.
.
The deduplication needs a workarea here. It is not a NOSORT operation – even if the result comes from the index. It seems that the CBO cannot guarantee that the result comes sorted. The clue is in the execution plan note.

But that’s for a future blog post.