Infrastructure at your Service

Oracle Team

MERGE JOIN CARTESIAN: a join method or a join type?

By August 9, 2018 Oracle No Comments

By Franck Pachot

.
I’ll present about join methods at POUG and DOAG. I’ll show how the different join methods work in order to better understand them. The idea is to show Nested Loops, Hash Join, Sort Merge Join, Merge Join Cartesian on the same query. I’ll run a simple join between DEPT and EMP with the USE_NL, USE_HASH, USE_MERGE and USE_MERGE_CARTESIAN hints. I’ll show the execution plan, with SQL Monitoring in text mode. And I’ll put some gdb breakpoints on the ‘qer’ (query execution rowsource) functions to run the plan operations step by step. Then I’ll do the same on a different query in order to show in detail the 12c adaptive plans.

But wait, I listed Nested Loops, Hash Join, Sort Merge Join, Merge Join Cartesian… but is Merge Cartesian Join really a join method? I mean, my query is not a cartesian join. I have all join predicates here. But for sure you can also do an inner join by starting with a cartesian join and then filter on the join predicate. As if doing physically what the old join syntax of Oracle is doing logically: by not putting any predicates in the from clause and add the join predicates in the where clause to filter over it.

If I look at the 12.2 documentation, it is a Join method
CaptureJoinMethods122

For the definition, a Join Method is how the join will be executed. It is not a decision of the SQL developer because SQL is declarative: you declare the result you want, and the optimizer will decide how to do it. And this is why hints are in comments: they are not part of the declarative syntax. Forcing how to do it is not part of SQL.

Just after listing the join methods, the documentation lists the join types which are part of the SQL because it declares the join result you expect. Inner join to get all matching rows. Semi join to get only the first matching row. Anti Join to get all rows which do not match. Outer join to get all matching rows in addition to those which matches. The syntax is INNER JOIN, OUTER JOIN, EXISTS or IN, NOT EXISTS or NOT IN. Join type is not ‘how’ but ‘what’.

Ok, so back to the join method. Let’s force it on my inner join between DEPT and EMP:

11g


SQL> alter session set current_schema=SCOTT statistics_level=all;
Session altered.
 
SQL> select /*+ leading(DEPT) USE_MERGE_CARTESIAN(EMP) FULL(DEPT) */ * from DEPT join EMP using(deptno);
 
DEPTNO DNAME LOC EMPNO ENAME JOB MGR HIREDATE SAL COMM
---------- -------------- ------------- ---------- ---------- --------- ---------- --------- ---------- ----------
10 ACCOUNTING NEW YORK 7782 CLARK MANAGER 7839 09-JUN-81 2450
10 ACCOUNTING NEW YORK 7839 KING PRESIDENT 17-NOV-81 5000
10 ACCOUNTING NEW YORK 7934 MILLER CLERK 7782 23-JAN-82 1300
10 RESEARCH DALLAS 7782 CLARK MANAGER 7839 09-JUN-81 2450
10 RESEARCH DALLAS 7839 KING PRESIDENT 17-NOV-81 5000
10 RESEARCH DALLAS 7934 MILLER CLERK 7782 23-JAN-82 1300
10 SALES CHICAGO 7782 CLARK MANAGER 7839 09-JUN-81 2450
10 SALES CHICAGO 7839 KING PRESIDENT 17-NOV-81 5000
10 SALES CHICAGO 7934 MILLER CLERK 7782 23-JAN-82 1300
10 OPERATIONS BOSTON 7782 CLARK MANAGER 7839 09-JUN-81 2450
10 OPERATIONS BOSTON 7839 KING PRESIDENT 17-NOV-81 5000
10 OPERATIONS BOSTON 7934 MILLER CLERK 7782 23-JAN-82 1300
 
12 rows selected.
 
SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------
SQL_ID 1xpfxq6pc30vq, child number 0
-------------------------------------
select /*+ leading(DEPT) USE_MERGE_CARTESIAN(EMP) FULL(DEPT) */ * from
DEPT join EMP using(deptno)
 
Plan hash value: 2034389985
 
------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 12 |00:00:00.01 | 7 | | | |
| 1 | MERGE JOIN CARTESIAN| | 1 | 14 | 12 |00:00:00.01 | 7 | | | |
| 2 | TABLE ACCESS FULL | DEPT | 1 | 4 | 4 |00:00:00.01 | 4 | | | |
| 3 | BUFFER SORT | | 4 | 4 | 12 |00:00:00.01 | 3 | 2048 | 2048 | 2048 (0)|
|* 4 | TABLE ACCESS FULL | EMP | 1 | 4 | 3 |00:00:00.01 | 3 | | | |
------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
4 - filter("DEPT"."DEPTNO"="EMP"."DEPTNO")
 

Ok, then I declared my result with an inner join query, and I forced the join method with a hint to show that it is possible. But look at the result. 12 rows? Only DEPTNO 10 where the SCOTT schema has employees in 10, 20 and 30? And only 3 employees here, repeated 4 times for each department name? That’s wrong result.

NEVER FORCE A CARTESIAN JOIN WITH USE_MERGE_CARTESIAN!

That’s a very old bug: Bug 17064391 Wrong result with USE_MERGE_CARTESIAN hint finally fixed in 12c (12.2 and backported in 12.1 PSU)

Then how is it fixed?

18c

With the fix, the hint is just ignored and a SORT MERGE JOIN is used here:

SQL> alter session set current_schema=SCOTT statistics_level=all;
Session altered.
 
SQL> select /*+ leading(DEPT) USE_MERGE_CARTESIAN(EMP) FULL(DEPT) */ * from DEPT join EMP using(deptno);
 
DEPTNO DNAME LOC EMPNO ENAME JOB MGR HIREDATE SAL COMM
---------- -------------- ------------- ---------- ---------- --------- ---------- --------- ---------- ----------
10 ACCOUNTING NEW YORK 7782 CLARK MANAGER 7839 09-JUN-81 2450
10 ACCOUNTING NEW YORK 7839 KING PRESIDENT 17-NOV-81 5000
10 ACCOUNTING NEW YORK 7934 MILLER CLERK 7782 23-JAN-82 1300
20 RESEARCH DALLAS 7566 JONES MANAGER 7839 02-APR-81 2975
20 RESEARCH DALLAS 7902 FORD ANALYST 7566 03-DEC-81 3000
20 RESEARCH DALLAS 7876 ADAMS CLERK 7788 23-MAY-87 1100
20 RESEARCH DALLAS 7369 SMITH CLERK 7902 17-DEC-80 800
20 RESEARCH DALLAS 7788 SCOTT ANALYST 7566 19-APR-87 3000
30 SALES CHICAGO 7521 WARD SALESMAN 7698 22-FEB-81 1250 500
30 SALES CHICAGO 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0
30 SALES CHICAGO 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300
30 SALES CHICAGO 7900 JAMES CLERK 7698 03-DEC-81 950
30 SALES CHICAGO 7698 BLAKE MANAGER 7839 01-MAY-81 2850
30 SALES CHICAGO 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400
 
14 rows selected.
 
SQL> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------
SQL_ID 1xpfxq6pc30vq, child number 0
-------------------------------------
select /*+ leading(DEPT) USE_MERGE_CARTESIAN(EMP) FULL(DEPT) */ * from
DEPT join EMP using(deptno)
 
Plan hash value: 1407029907
 
--------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 14 |00:00:00.01 | 12 | 12 | | | |
| 1 | MERGE JOIN | | 1 | 14 | 14 |00:00:00.01 | 12 | 12 | | | |
| 2 | SORT JOIN | | 1 | 4 | 4 |00:00:00.01 | 6 | 6 | 2048 | 2048 | 2048 (0)|
| 3 | TABLE ACCESS FULL| DEPT | 1 | 4 | 4 |00:00:00.01 | 6 | 6 | | | |
|* 4 | SORT JOIN | | 4 | 14 | 14 |00:00:00.01 | 6 | 6 | 2048 | 2048 | 2048 (0)|
| 5 | TABLE ACCESS FULL| EMP | 1 | 14 | 14 |00:00:00.01 | 6 | 6 | | | |
--------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
4 - access("DEPT"."DEPTNO"="EMP"."DEPTNO")
filter("DEPT"."DEPTNO"="EMP"."DEPTNO")

So here the result is good, thanks to the fix, and we clearly see how it is fixed: the USE_MERGE_CARTESIAN hint has been ignored.

And the funny thing is that when you look at the 18c documentation, the Merge Join Cartesian is not a join method anymore but a join type:
CaptureJoinMethods183

Exactly the same paragraph, but now in join types (the ‘what’) rather than in join methods (the ‘when’).

What or How?

Actually, in my opinion, it is both. When you explicitly want a cartesian join, that’s a join type described by the CROSS JOIN in the ANSI join syntax, or the lack of related predicates in the old syntax. This is ‘what’. But you may also encounter a MERGE JOIN CARTESIAN for a non-cartesian join just because the optimizer decides it is more efficient. When you have very few rows on both sides, it may be faster to start with a cartesian product on small rowsources. This can be part of star transformation where fact rows are joined back to the cartesian product of filtered dimensions in order to project the dimension attributes. This is ‘how’ it will be executed. We also see it when the optimizer underestimates the cardinalities and is followed by a long nested loop.

When?

So, let’s look at the documentation “When the Optimizer Considers Cartesian Joins”:
CaptureJoinMethods183-when

  • No join condition exists: that’s when cartesian product is what we want
  • The ORDERED hint specifies a table before its join table is specified: that’s when it is the only join method possible with the specified join order
  • A Cartesian join is an efficient method: then it is a method there, even if documented in join types.

In conclusion, cartesian join is a join type. It can also be used as a join method when the optimizer decides to. But you cannot decide it yourself by hinting since 12c, and trying to do so in previous version is a very bad idea and can returns wrong results.

So, for this one I’ll explicitely run a CROSS JOIN:
CaptureUSEMERGECARTESIAN

The query is on top. The SQL monitor in the middle, showing that we are currently active on reading rows from EMP. The bottom shows the ‘qer’ functions backtrace: the fetch call is propagated from opifch2 for the SELECT STATEMENT, through the MERGE JOIN CARTESIAN (querjo), the BUFFER SORT (qerso), to the TABLE ACCESS (qertb).

So basically, the goal of this full-demo presentation is to show how to read the execution plan by understanding how it is executed. This qertbFetch on the inner table EMP is executed only on the first row coming from the outer table DEPT. As the rows are returned to a buffer, the further iterations will fetch only from this buffer and will not go further than qersoFetchSimple. The qersoProcessULS (‘process underlying row source’ – see Frits Hoogland annotations) is run only once. This is the big difference with Nested Loop where the inner loop on the underlying rowsource is run for each outer loop iteration: those two loops are nested – thus the name. But the function for the join part is the same for Nested Loop, Sort Merge Join and Merge Join Cartesian: qerjo. Only the underlying operations differenciate the join methods.

Last comment, we don’t see any function which really sort the rows in this buffer (as we will see for the Sort Merge Join method) because there is no sorting despites the name of the BUFFER SORT operation. More info on Jonathan Lewis blog.

 

Leave a Reply

Oracle Team
Oracle Team