Oracle and SQL Server – Conditions In WHERE Clause vs. JOIN Clause – Why And When It Matters? Must Read!

This post is about the various types of JOIN’s and how the conditions affect the results when placed in the JOIN clause versus the WHERE clause.

OUTER JOINS

Database programmers use OUTER JOINS in their job day in and day out. There are certain “gotchas” that you need to watch out for. One not only has to have a good understanding of when and how to use them but test extensively for various data conditions to make sure they produce the right results.

If you are not familiar with OUTER joins, I recommend this article from CodeProject by C. L Moffatt that does a great job of explaining and doing a visual representation of the same. Please read the complete article from its source.

ANSI vs Non-ANSI:

The article below applies to both SQL Server and Oracle.

In SQL Server, the non-ANSI syntax for outer joins is either the “*=” or the “=*” operator and in Oracle the equivalent operators are “ (+)=” and “=(+)”.  I am not going into the nuances of these operators as both Microsoft and Oracle are deprecating the non-ANSI syntax for OUTER joins.

In Oracle, besides the fact that non-ANSI standard for the outer join is deprecated, the exact features supported is also different. The ANSI syntax supports more features than the older non-ANSI syntax.

You might ask, if the join is an INNER JOIN, why does it matter if I use ANSI vs. the older method of specifying joins in the WHERE clause.  The reasons are:

  • ANSI syntax is the norm
  • If you decide to change one or more of the INNER JOINs into an OUTER JOIN, using the older syntax would mean extensive changes and testing for the SQL than otherwise.
  • If ANSI syntax is used and if you decide to port your tables and SQL to another database, it is almost guaranteed to work without any changes

Sample data for testing

Employees:

EMPID EMPNAME ADDRESS ANNUALSALARY
1 Mike 61211
2 Ed 123 Main Street
3 Will 45 E.Lock Street 80394
4 Brad 1 32nd Street 45321
5 Alex 601 Broadway 76890
6 Sally
7 Tom 100234

Employee bonuses by year:

EMPID BONUSYEAR BONUSAMOUNT
1 2010 1000
1 2011 1100
1 2012 1200
3 2010 2000
3 2011 2100
5 2012 1210

In the SQL that follows, both of the above pieces are captured in the WITH clause

If testing with Oracle, use:

WITH Emp
AS
(
    SELECT 1 AS EmpId, 'Mike' AS EmpName, NULL AS Address, 61211 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT 2 AS EmpId, 'Ed' AS EmpName, '123 Main Street' AS Address, NULL AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT 3 AS EmpId, 'Will' AS EmpName, '45 E.Lock Street' AS Address, 80394 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT 4 AS EmpId, 'Brad' AS EmpName, '1 32nd Street' AS Address, 45321 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT 5 AS EmpId, 'Alex' AS EmpName, '601 Broadway' AS Address, 76890 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT 6 AS EmpId, 'Sally' AS EmpName, NULL AS Address, NULL AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT 7 AS EmpId, 'Tom' AS EmpName, '' AS Address, 100234 AS AnnualSalary
    FROM DUAL
),
Bonus AS
(
    --Mike's bonuses
    SELECT 1 AS EmpId, 2010 AS BonusYear, 1000 AS BonusAmount
    FROM DUAL
    UNION ALL
    SELECT 1 AS EmpId, 2011 AS BonusYear, 1100 AS BonusAmount
    FROM DUAL
    UNION ALL
    SELECT 1 AS EmpId, 2012 AS BonusYear, 1200 AS BonusAmount
    FROM DUAL
    UNION ALL
    --Will's bonuses
    SELECT 3 AS EmpId, 2010 AS BonusYear, 2000 AS BonusAmount
    FROM DUAL
    UNION ALL
    SELECT 3 AS EmpId, 2011 AS BonusYear, 2100 AS BonusAmount
    FROM DUAL
    UNION ALL
    --Alex's bonuses
    SELECT 5 AS EmpId, 2012 AS BonusYear, 1210 AS BonusAmount
    FROM DUAL
)
------------------------------------------------------------------
-- The SQL above this line will be reused
-- The SQL below this line will be switched out for various tests
------------------------------------------------------------------
SELECT Emp.EmpName,
        Bonus.BonusYear,
        Bonus.BonusAmount
FROM Emp
    INNER JOIN Bonus
        ON emp.EmpId = Bonus.EmpId
ORDER BY
    Emp.EmpName,
    Bonus.BonusYear,
    Bonus.BonusAmount

If testing with SQL Server, use:

WITH Emp
AS
(
    SELECT 1 AS EmpId, 'Mike' AS EmpName, NULL AS Address, 61211 AS AnnualSalary
    UNION ALL
    SELECT 2 AS EmpId, 'Ed' AS EmpName, '123 Main Street' AS Address, NULL AS AnnualSalary
    UNION ALL
    SELECT 3 AS EmpId, 'Will' AS EmpName, '45 E.Lock Street' AS Address, 80394 AS AnnualSalary
    UNION ALL
    SELECT 4 AS EmpId, 'Brad' AS EmpName, '1 32nd Street' AS Address, 45321 AS AnnualSalary
    UNION ALL
    SELECT 5 AS EmpId, 'Alex' AS EmpName, '601 Broadway' AS Address, 76890 AS AnnualSalary
    UNION ALL
    SELECT 6 AS EmpId, 'Sally' AS EmpName, NULL AS Address, NULL AS AnnualSalary
    UNION ALL
    SELECT 7 AS EmpId, 'Tom' AS EmpName, '' AS Address, 100234 AS AnnualSalary
),
Bonus AS
(
    --Mike's bonuses
    SELECT 1 AS EmpId, 2010 AS BonusYear, 1000 AS BonusAmount
    UNION ALL
    SELECT 1 AS EmpId, 2011 AS BonusYear, 1100 AS BonusAmount
    UNION ALL
    SELECT 1 AS EmpId, 2012 AS BonusYear, 1200 AS BonusAmount
    UNION ALL
    --Will's bonuses
    SELECT 3 AS EmpId, 2010 AS BonusYear, 2000 AS BonusAmount
    UNION ALL
    SELECT 3 AS EmpId, 2011 AS BonusYear, 2100 AS BonusAmount
    UNION ALL
    --Alex's bonuses
    SELECT 5 AS EmpId, 2012 AS BonusYear, 1210 AS BonusAmount
)
------------------------------------------------------------------
-- The SQL above this line will be reused
-- The SQL below this line will be switched out for various tests
------------------------------------------------------------------
SELECT Emp.EmpName,
        Bonus.BonusYear,
        Bonus.BonusAmount
FROM Emp
    INNER JOIN Bonus
        ON emp.EmpId = Bonus.EmpId
ORDER BY
    Emp.EmpName,
    Bonus.BonusYear,
    Bonus.BonusAmount

Since the above SQL does an INNER JOIN, it gets all the employees that have a bonus and orders the results by name, year and amount.

EMPNAME BONUSYEAR BONUSAMOUNT
Mike 2010 1000
Mike 2011 1100
Mike 2012 1200
Will 2010 2000
Will 2011 2100
Alex 2012 1210

Only three employees have bonuses and so the inner join displays only those. This is a simple straight-forward case.

To explain what I set out to explain, let us consider a few scenarios:

Scenario #1:

Select all employees and their bonuses by year irrespective of whether they had a bonus.

Solution:

We know that we need to do an outer join to the Bonus records. So, we do this

{BIG WITH CLAUSE GOES HERE}
SELECT Emp.EmpName,
        Bonus.BonusYear,
        Bonus.BonusAmount
FROM Emp
    LEFT OUTER JOIN Bonus
        ON emp.EmpId = Bonus.EmpId
ORDER BY
    Emp.EmpName,
    Bonus.BonusYear,
    Bonus.BonusAmount
EMPNAME BONUSYEAR BONUSAMOUNT
Alex 2012 1210
Brad
Ed
Mike 2010 1000
Mike 2011 1100
Mike 2012 1200
Sally
Tom
Will 2010 2000
Will 2011 2100

Scenario #2:

Select the employees and their bonuses by year of those who had a bonus and the bonus was greater than 1,500.

Solution:

We know that we need to do an inner join but should we put the condition for BonusAmount in the JOIN or in the WHERE clause?

Use “BonusAmount > 1500” in the JOIN? Use “BonusAmount > 1500” in the WHERE CLAUSE?
{BIG WITH CLAUSE GOES HERE}
SELECT Emp.EmpName,
        Bonus.BonusYear,
        Bonus.BonusAmount
FROM Emp
    INNER JOIN Bonus
        ON emp.EmpId = Bonus.EmpId
         AND Bonus.BonusAmount > 1500
ORDER BY
    Emp.EmpName,
    Bonus.BonusYear,
    Bonus.BonusAmount
{BIG WITH CLAUSE GOES HERE}
SELECT Emp.EmpName,
        Bonus.BonusYear,
        Bonus.BonusAmount
FROM Emp
    INNER JOIN Bonus
        ON emp.EmpId = Bonus.EmpId
WHERE
     Bonus.BonusAmount > 1500
ORDER BY
    Emp.EmpName,
    Bonus.BonusYear,
    Bonus.BonusAmount

In this specific scenario (with INNER JOIN), where we choose to put the BonusAmount restriction does not matter. They will both produce the same results. The optimizer might choose to rewrite queries like these which may result in an execution plan difference between the two but that does not have any bearing on the results produced. Where you put the condition is a matter or preference and readability in this case.

EMPNAME BONUSYEAR BONUSAMOUNT
Will 2010 2000
Will 2011 2100

Scenario #3:

Select all employees and their bonuses by year irrespective of whether they had a bonus. Display bonus details only if the bonus was greater than 1,500 else NULLs.

Solution:

Again, we know that we need to do an outer join but should we put the condition for BonusAmount in the LEFT OUTER JOIN or in the WHERE clause?

The choice in this case does make a big difference in the results and only one of them will produce the intended result

Use “BonusAmount > 1500” in the JOIN? Use “BonusAmount > 1500” in the WHERE CLAUSE?
{BIG WITH CLAUSE GOES HERE}
SELECT Emp.EmpName,
        Bonus.BonusYear,
        Bonus.BonusAmount
FROM Emp
    LEFT OUTER JOIN Bonus
        ON emp.EmpId = Bonus.EmpId
         AND Bonus.BonusAmount > 1500
ORDER BY
    Emp.EmpName,
    Bonus.BonusYear,
    Bonus.BonusAmount

{BIG WITH CLAUSE GOES HERE}
SELECT Emp.EmpName,
        Bonus.BonusYear,
        Bonus.BonusAmount
FROM Emp
    LEFT OUTER JOIN Bonus
        ON emp.EmpId = Bonus.EmpId
WHERE
     Bonus.BonusAmount > 1500
ORDER BY
    Emp.EmpName,
    Bonus.BonusYear,
    Bonus.BonusAmount

These are the results produced by the above two respectively:

“BonusAmount > 1500” in the JOIN “BonusAmount > 1500” in the WHERE CLAUSE
EMPNAME BONUSYEAR BONUSAMOUNT
Alex
Brad
Ed
Mike
Sally
Tom
Will 2010 2000
Will 2011 2100
EMPNAME BONUSYEAR BONUSAMOUNT
Will 2010 2000
Will 2011 2100

Scenario #4:

Select employees and their bonuses by year of those who had a bonus and the bonus was greater than 1,500.

Solution:

Isn’t this the same as scenario #2?  We could use the WHERE clause restriction based option from scenario #3 above or simply use the INNER JOIN option of scenario #2 instead. This scenario was repeated to illustrate this.

Scenario #5:

Select all employees whose annual salary is greater than 75,000. For the qualifying employees, display bonus details only if the bonus was greater than 0 (zero) else NULL.

Solution:

We have learnt a lesson. So we know that we need to put the bonus amount restriction in the LEFT OUTER JOIN clause instead of the WHERE clause. However, the “AnnualSalary” is an attribute of the Employee SQL and not the Bonus SQL. Where should we put the annual salary restriction?

Use “AnnualSalary > 75000” in the JOIN? Use “AnnualSalary > 75000” in the WHERE CLAUSE?
{BIG WITH CLAUSE GOES HERE}
SELECT Emp.EmpName,
        Emp.AnnualSalary,
        Bonus.BonusYear,
        Bonus.BonusAmount
FROM Emp
    LEFT OUTER JOIN Bonus
        ON emp.EmpId = Bonus.EmpId
         AND Bonus.BonusAmount > 0
         AND Emp.AnnualSalary > 75000
ORDER BY
    Emp.EmpName,
    Bonus.BonusYear,
    Bonus.BonusAmount
{BIG WITH CLAUSE GOES HERE}
SELECT Emp.EmpName,
        Emp.AnnualSalary,
        Bonus.BonusYear,
        Bonus.BonusAmount
FROM Emp
    LEFT OUTER JOIN Bonus
        ON emp.EmpId = Bonus.EmpId
         AND Bonus.BonusAmount > 0
WHERE
    Emp.AnnualSalary > 75000
ORDER BY
    Emp.EmpName,
    Bonus.BonusYear,
    Bonus.BonusAmount

These are the results produced by the above two respectively and the results are again quite different.

“AnnualSalary > 75000” in the JOIN “AnnualSalary > 75000” in the WHERE CLAUSE
EMPNAME ANNUALSALARY BONUSYEAR BONUSAMOUNT
Alex 76890 2012 1210
Brad 45321
Ed
Mike 61211
Sally
Tom 100234
Will 80394 2010 2000
Will 80394 2011 2100
EMPNAME ANNUALSALARY BONUSYEAR BONUSAMOUNT
Alex 76890 2012 1210
Tom 100234
Will 80394 2010 2000
Will 80394 2011 2100

Some observations in this scenario:

  • The JOIN based restriction on salary gave us all employees but the “Bonus” columns became NULL on rows with employees making less than 75,000 per year even if they had a bonus.
  • The WHERE based restriction on salary gave us just the employees with salary greater than 75,000 whether or not they had a bonus. Notice Tom had no bonus but has a salary greater than 75,000 and was included

Lessons:

These were all very simple cases. The same cases above become a bit more complicated if the “by year” criteria was removed to list a DISTINCT set of employees for qualifying criteria. Although it is impulsive to just add a DISTINCT after removing the “BonusYear” column, in most cases, it is not the optimal approach for performance.  Correlated sub-queries should be used (EXISTS/NOT EXISTS). In some instances, IN and NOT IN may work well. However, in all cases, one needs to test thoroughly to make sure that the SQL produces the intended results.

The lessons:

  • When every join is INNER, where you place the restrictions (in the JOIN vs WHERE) does not matter
  • The WHERE clause limits what is produced by the JOINs.
    • If the join was OUTER, the WHERE clause has a much larger set on which it has to apply the WHERE
    • If the join was INNER, the WHERE clause has a much smaller list to further limit by what is in WHERE
  • It makes logical sense to filter as much as possible in the JOIN for your scenario especially if the join is an OUTER JOIN although all modern optimizers are smart enough to do that for you internally irrespective of where you placed the filter
  • If the WHERE clause has limitation based on a column of the OUTER JOINed table (the right side table in a LEFT OUTER JOIN and left side table in a RIGHT OUTER JOIN), the join is effectively turned into an INNER JOIN
  • ….read bullets in this section again 😉

Quick reference chart on JOIN’s

For those that are not very familiar with the “*OUTER*” syntax, I have included an article from CodeProject by C. L Moffatt that does a great job of explaining it and doing a visual representation of the same. Please read the complete article from its source

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s