Strange CBO behavior leads the MCJ and high cardinality

If the list of tables in the FROM part of the SELECT statement contains
tables that are not needed (none of their columns are selected nor used
in the WHERE part of the statement), the CBO starts increasing the
cardinality and introduces merge cartesian joins (MCJ) in the execution
plan.

The case was reproduced using the unmodified HR schema of a default
Oracle installation.

Please access this linked sql file in order to see the details.

-------------
-- Query 1
-------------
explain plan for 
select distinct
 emp.email
from
 hr.employees emp
 , hr.departments dept
-- , hr.countries
-- , hr.regions
-- , hr.jobs
where
 emp.department_id = dept.department_id;
Plan hash value: 3042654289
 
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 106 | 1166 | 4 (25)| 00:00:01 |
| 1 | HASH UNIQUE | | 106 | 1166 | 4 (25)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| EMPLOYEES | 106 | 1166 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
 2 - filter("EMP"."DEPARTMENT_ID" IS NOT NULL)
-------------
-- Query 2
-------------
explain plan for 
select distinct
 emp.email
from
 hr.employees emp
 , hr.departments dept
 , hr.countries
-- , hr.regions
-- , hr.jobs
where
 emp.department_id = dept.department_id;
Plan hash value: 1471726929
 
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 106 | 1166 | 29 (4)| 00:00:01 |
| 1 | HASH UNIQUE | | 106 | 1166 | 29 (4)| 00:00:01 |
| 2 | MERGE JOIN CARTESIAN | | 2650 | 29150 | 28 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL | EMPLOYEES | 106 | 1166 | 3 (0)| 00:00:01 |
| 4 | BUFFER SORT | | 25 | | 25 (0)| 00:00:01 |
| 5 | INDEX FAST FULL SCAN| COUNTRY_C_ID_PK | 25 | | 0 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
 3 - filter("EMP"."DEPARTMENT_ID" IS NOT NULL)
-------------
-- Query 3
-------------
explain plan for 
select distinct
 emp.email
from
 hr.employees emp
 , hr.departments dept
 , hr.countries
 , hr.regions
-- , hr.jobs
where
 emp.department_id = dept.department_id;
Plan hash value: 3655716554
 
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 106 | 1166 | 103 (2)| 00:00:02 |
| 1 | HASH UNIQUE | | 106 | 1166 | 103 (2)| 00:00:02 |
| 2 | MERGE JOIN CARTESIAN | | 10600 | 113K| 101 (0)| 00:00:02 |
| 3 | MERGE JOIN CARTESIAN | | 424 | 4664 | 7 (0)| 00:00:01 |
| 4 | INDEX FULL SCAN | REG_ID_PK | 4 | | 1 (0)| 00:00:01 |
| 5 | BUFFER SORT | | 106 | 1166 | 6 (0)| 00:00:01 |
|* 6 | TABLE ACCESS FULL | EMPLOYEES | 106 | 1166 | 2 (0)| 00:00:01 |
| 7 | BUFFER SORT | | 25 | | 100 (0)| 00:00:02 |
| 8 | INDEX FAST FULL SCAN| COUNTRY_C_ID_PK | 25 | | 0 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
 6 - filter("EMP"."DEPARTMENT_ID" is not null)
 
-------------
-- Query 4
-------------
explain plan for 
select distinct
 emp.email
from
 hr.employees emp
 , hr.departments dept
 , hr.countries
 , hr.regions
 , hr.jobs
where
 emp.department_id = dept.department_id;
Plan hash value: 2772254924
 
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 106 | 1166 | 1868 (1)| 00:00:23 |
| 1 | HASH UNIQUE | | 106 | 1166 | 1868 (1)| 00:00:23 |
| 2 | MERGE JOIN CARTESIAN | | 201K| 2163K| 1858 (1)| 00:00:23 |
| 3 | MERGE JOIN CARTESIAN | | 8056 | 88616 | 89 (0)| 00:00:02 |
| 4 | MERGE JOIN CARTESIAN | | 76 | | 4 (0)| 00:00:01 |
| 5 | INDEX FULL SCAN | REG_ID_PK | 4 | | 1 (0)| 00:00:01 |
| 6 | BUFFER SORT | | 19 | | 3 (0)| 00:00:01 |
| 7 | INDEX FAST FULL SCAN| JOB_ID_PK | 19 | | 1 (0)| 00:00:01 |
| 8 | BUFFER SORT | | 106 | 1166 | 89 (2)| 00:00:02 |
|* 9 | TABLE ACCESS FULL | EMPLOYEES | 106 | 1166 | 1 (0)| 00:00:01 |
| 10 | BUFFER SORT | | 25 | | 1857 (1)| 00:00:23 |
| 11 | INDEX FAST FULL SCAN | COUNTRY_C_ID_PK | 25 | | 0 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
 9 - filter("EMP"."DEPARTMENT_ID" is not null)
select
 *
from
 table(dbms_xplan.display);

One thought on “Strange CBO behavior leads the MCJ and high cardinality”

  1. I saw your comments attached to the Cartesian Merge Join article on Jonathan Lewis’ blog. The opening statement for that blog article is: “Have you ever had an execution plan which gave you a Cartesian join that you knew just couldn’t be happening?” Your test case shows something different from what Jonathan demonstrated in his blog article. What you are seeing is expected behavior.

    I looked at the SQL script that you linked to above. The execution plan for your first SQL statement shows an example of the table elimination optimization that was introduced in Oracle 10.2, see this article written by the group that develops the Oracle optimizer:
    https://blogs.oracle.com/optimizer/entry/why_are_some_of_the_tables_in_my_query_missing_from_the_plan
    That table elimination optimization allowed the optimizer to remove the DEPARTMENTS table from the execution plan. That optimization was permitted because three key elements were present:
    1) There is a declared foreign key between EMPLOYEES.DEPARTMENT_ID and DEPARTMENTS.DEPARTMENT_ID
    2) The first query explicitly joins the two tables on those columns having the defined foreign key relationship
    3) The SELECT clause for the first SQL statement does not include any columns from the DEPARTMENTS table

    Seeing this result for the first SQL statement, you are apparently then trying to generalize the table elimination optimization simply as: “As long as I do not include any columns from a table in a SQL statement’s SELECT clause, there is no change to the outcome of the SQL statement (or the SQL statement’s execution plan) when I add an additional table to the FROM clause without explicitly declaring the join to the additional table in the WHERE clause.” Oracle Database does not work like that. If you do not define a relationship between the tables, then Oracle Database *must* generate m * n rows in the resultset, where m is the number of rows retrieved from the first table and n is the number of rows retrieved from the second table (or an earlier generated resultset).

    It is necessary to always explicitly tell the Oracle query optimizer how to join any tables added to the SQL statement (unless the cross-product between the tables is intended), to understand that declared foreign keys are not sufficient to prevent Cartesian joins, and that the inclusion of DISTINCT in the SQL statement is often a sign that there is a join problem in the SQL statement.

    Some database environments, such as Microsoft Access, permit relationships between tables to be defined ahead of time – Oracle foreign key relationships are not the equivalent of these declared relationships. In Microsoft Access, when a SQL statement is built Access will automatically add the defined relationships to the ANSI style join syntax in the SQL statement to avoid Cartesian joins. Oracle Database does not do that with defined foreign keys, in part because there may be times when two tables need to be joined using columns other than the defined foreign key columns.

    I reproduced your sample SQL statement in Microsoft Access and defined the relationships between the tables (note that I had to add one additional table to the query). Once the relationships were defined, I simply added the tables of interest and the one column that was of interest (based on your examples). Microsoft Access behind the scene generated an ANSI style join for the tables, converting my defined relationships into the ON clauses in the ANSI style join. So, technically Microsoft Access correctly joined the tables without adding the table names (or table aliases) to the WHERE clause of the SQL statement. Microsoft Access translates the generated ANSI SQL statement before submitting it to the Oracle database – if it was submitted as a single SQL block, the SQL statement and execution plan might look something like this:
    [code]
    SQL_ID 9xgaaxshfm01t, child number 0
    -------------------------------------
    SELECT HR_EMPLOYEES.EMAIL FROM ((((HR.EMPLOYEES HR_EMPLOYEES INNER
    JOIN HR.DEPARTMENTS HR_DEPARTMENTS ON HR_EMPLOYEES.DEPARTMENT_ID =
    HR_DEPARTMENTS.DEPARTMENT_ID) INNER JOIN HR.JOBS HR_JOBS ON
    HR_EMPLOYEES.JOB_ID = HR_JOBS.JOB_ID) INNER JOIN HR.LOCATIONS
    HR_LOCATIONS ON HR_DEPARTMENTS.LOCATION_ID = HR_LOCATIONS.LOCATION_ID)
    INNER JOIN HR.COUNTRIES HR_COUNTRIES ON HR_LOCATIONS.COUNTRY_ID =
    HR_COUNTRIES.COUNTRY_ID) INNER JOIN HR.REGIONS HR_REGIONS ON
    HR_COUNTRIES.REGION_ID = HR_REGIONS.REGION_ID

    Plan hash value: 3039792218

    -----------------------------------------------------------------------------------------------------------------------------------
    | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
    -----------------------------------------------------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | | 106 |00:00:00.01 | 28 | | | |
    |* 1 | HASH JOIN | | 1 | 106 | 106 |00:00:00.01 | 28 | 1517K| 1517K| 1291K (0)|
    |* 2 | HASH JOIN | | 1 | 27 | 27 |00:00:00.01 | 20 | 1517K| 1517K| 1121K (0)|
    | 3 | NESTED LOOPS | | 1 | 23 | 23 |00:00:00.01 | 12 | | | |
    | 4 | VIEW | index$_join$_006 | 1 | 23 | 23 |00:00:00.01 | 8 | | | |
    |* 5 | HASH JOIN | | 1 | | 23 |00:00:00.01 | 8 | 1096K| 1096K| 1438K (0)|
    | 6 | INDEX FAST FULL SCAN| LOC_COUNTRY_IX | 1 | 23 | 23 |00:00:00.01 | 4 | | | |
    | 7 | INDEX FAST FULL SCAN| LOC_ID_PK | 1 | 23 | 23 |00:00:00.01 | 4 | | | |
    |* 8 | INDEX UNIQUE SCAN | COUNTRY_C_ID_PK | 23 | 1 | 23 |00:00:00.01 | 4 | | | |
    | 9 | VIEW | index$_join$_002 | 1 | 27 | 27 |00:00:00.01 | 8 | | | |
    |* 10 | HASH JOIN | | 1 | | 27 |00:00:00.01 | 8 | 1096K| 1096K| 1416K (0)|
    | 11 | INDEX FAST FULL SCAN | DEPT_ID_PK | 1 | 27 | 27 |00:00:00.01 | 4 | | | |
    | 12 | INDEX FAST FULL SCAN | DEPT_LOCATION_IX | 1 | 27 | 27 |00:00:00.01 | 4 | | | |
    | 13 | TABLE ACCESS FULL | EMPLOYEES | 1 | 107 | 107 |00:00:00.01 | 8 | | | |
    -----------------------------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------
    1 - access("HR_EMPLOYEES"."DEPARTMENT_ID"="HR_DEPARTMENTS"."DEPARTMENT_ID")
    2 - access("HR_DEPARTMENTS"."LOCATION_ID"="HR_LOCATIONS"."LOCATION_ID")
    5 - access(ROWID=ROWID)
    8 - access("HR_LOCATIONS"."COUNTRY_ID"="HR_COUNTRIES"."COUNTRY_ID")
    filter("HR_COUNTRIES"."REGION_ID" IS NOT NULL)
    10 - access(ROWID=ROWID)
    [/code]

    Note in the above that there is no Cartesian join as you found in all but your first SQL statement example. In your case, seeing a Cartesian join will almost certainly be an indication that you forgot to add a join predicate or two in the WHERE clause.

    You may be able to use the defined foreign key relationships to help determine the missing join predicates. You should be able to determine the defined foreign key relationships by examining the table definitions:
    [code]
    SET PAGESIZE 0
    SET LONG 90000
    SPOOL 'GETMETA.SQL'
    SELECT
    DBMS_METADATA.GET_DDL('TABLE','EMPLOYEES','HR')
    FROM
    DUAL;
    SELECT
    DBMS_METADATA.GET_DDL('TABLE','DEPARTMENTS','HR')
    FROM
    DUAL;
    SELECT
    DBMS_METADATA.GET_DDL('TABLE','LOCATIONS','HR')
    FROM
    DUAL;
    SELECT
    DBMS_METADATA.GET_DDL('TABLE','COUNTRIES','HR')
    FROM
    DUAL;
    SELECT
    DBMS_METADATA.GET_DDL('TABLE','REGIONS','HR')
    FROM
    DUAL;
    SELECT
    DBMS_METADATA.GET_DDL('TABLE','JOBS','HR')
    FROM
    DUAL;
    FROM
    DUAL;
    SPOOL OFF
    [/code]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>