Oracle (or) SQL Server: Find Row Level Data Differences Using MINUS/EXCEPT – 100 Level

Changes needed for SQL Server

The simple SQL illustrations mostly show Oracle code but the final SQL is also shown for SQL Server (simply remove the “FROM DUAL” references and replace “MINUS” with “EXCEPT”).

What this post is about?

When we need to compare two sets of data to find out which rows are different between the two quickly, this trick that you are about to see should help.

You might be trying to find the row level differences between

  • Two tables similar in structure
  • Two sets of data in the same table
  • Compare a few similar columns between two tables
  • Compare a few similar columns in the same table for two sets of rows

Exactly what are we trying to do?

Let DataSet_1 (BEFORE) be

SSN EMPNAME ADDRESS ANNUALSALARY
221021 Mike 61211
111221021 Ed 123 Main Street
512221021 Will 45 E.Lock Street 80394
872221021 Brad 1 32nd Street 45321
92221021 Alex 601 Broadway 76890
372221021 Sally
232911021 Tom 100234

Let DataSet_2 (AFTER) be

SSN EMPNAME ADDRESS ANNUALSALARY
221021 Mike (Changed) 61211
872221021 Brad 1 32nd Street 45321
92221021 Alex 601 Broadway 76890
372221021 Sally
232911021 Tom 100234
99921021 Brandy (Added) 45 E.Lock Street 80394

We need to identify

  • DELETE
    Two (deleted) rows in Dataset_1 that is not in DataSet_2 (Ed and Will)
  • UPDATE
    A changed row in DataSet_2 that has a different value than DataSet_1
  • INSERT
    A new row in DataSet_2

DataSet_1 (or BEFORE picture)

    SELECT '000221021' AS SSN, 'Mike' AS EmpName, NULL AS Address, 61211 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '111221021' AS SSN, 'Ed' AS EmpName, '123 Main Street' AS Address, NULL AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '512221021' AS SSN, 'Will' AS EmpName, '45 E.Lock Street' AS Address, 80394 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '872221021' AS SSN, 'Brad' AS EmpName, '1 32nd Street' AS Address, 45321 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '092221021' AS SSN, 'Alex' AS EmpName, '601 Broadway' AS Address, 76890 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '372221021' AS SSN, 'Sally' AS EmpName, NULL AS Address, NULL AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '232911021' AS SSN, 'Tom' AS EmpName, '' AS Address, 100234 AS AnnualSalary
    FROM DUAL

DataSet_1 (or AFTER picture)

    SELECT '000221021' AS SSN, 'Mike (Changed)' AS EmpName, NULL AS Address, 61211 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '872221021' AS SSN, 'Brad' AS EmpName, '1 32nd Street' AS Address, 45321 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '092221021' AS SSN, 'Alex' AS EmpName, '601 Broadway' AS Address, 76890 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '372221021' AS SSN, 'Sally' AS EmpName, NULL AS Address, NULL AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '232911021' AS SSN, 'Tom' AS EmpName, '' AS Address, 100234 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '99921021' AS SSN, 'Brandy (Added)' AS EmpName, '45 E.Lock Street' AS Address, 80394 AS AnnualSalary
    FROM DUAL

Oracle Code: Differences between DataSet_1 and DataSet_2 (BEFORE and AFTER)

WITH Old_Data
AS
(
    SELECT '000221021' AS SSN, 'Mike' AS EmpName, NULL AS Address, 61211 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '111221021' AS SSN, 'Ed' AS EmpName, '123 Main Street' AS Address, NULL AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '512221021' AS SSN, 'Will' AS EmpName, '45 E.Lock Street' AS Address, 80394 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '872221021' AS SSN, 'Brad' AS EmpName, '1 32nd Street' AS Address, 45321 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '092221021' AS SSN, 'Alex' AS EmpName, '601 Broadway' AS Address, 76890 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '372221021' AS SSN, 'Sally' AS EmpName, NULL AS Address, NULL AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '232911021' AS SSN, 'Tom' AS EmpName, '' AS Address, 100234 AS AnnualSalary
    FROM DUAL
),
New_Data
AS
(
    SELECT '000221021' AS SSN, 'Mike (Changed)' AS EmpName, NULL AS Address, 61211 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '872221021' AS SSN, 'Brad' AS EmpName, '1 32nd Street' AS Address, 45321 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '092221021' AS SSN, 'Alex' AS EmpName, '601 Broadway' AS Address, 76890 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '372221021' AS SSN, 'Sally' AS EmpName, NULL AS Address, NULL AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '232911021' AS SSN, 'Tom' AS EmpName, '' AS Address, 100234 AS AnnualSalary
    FROM DUAL
    UNION ALL
    SELECT '99921021' AS SSN, 'Brandy (Added)' AS EmpName, '45 E.Lock Street' AS Address, 80394 AS AnnualSalary
    FROM DUAL
)
SELECT 'Changed or Removed from BEFORE' AS Reason, a.*
FROM
(
    SELECT * FROM Old_Data
    MINUS
    SELECT * FROM New_Data
) a
UNION ALL
SELECT 'Changed or Added in AFTER' AS Reason, b.*
FROM
(
    SELECT * FROM New_Data
    MINUS
    SELECT * FROM Old_Data
) b

SQL Server Code: Differences between DataSet_1 and DataSet_2 (BEFORE and AFTER)

I simply removed the “FROM DUAL” references and replaced “MINUS” with “EXCEPT”

WITH Old_Data
AS
(
    SELECT '000221021' AS SSN, 'Mike' AS EmpName, NULL AS Address, 61211 AS AnnualSalary
    UNION ALL
    SELECT '111221021' AS SSN, 'Ed' AS EmpName, '123 Main Street' AS Address, NULL AS AnnualSalary
    UNION ALL
    SELECT '512221021' AS SSN, 'Will' AS EmpName, '45 E.Lock Street' AS Address, 80394 AS AnnualSalary
    UNION ALL
    SELECT '872221021' AS SSN, 'Brad' AS EmpName, '1 32nd Street' AS Address, 45321 AS AnnualSalary
    UNION ALL
    SELECT '092221021' AS SSN, 'Alex' AS EmpName, '601 Broadway' AS Address, 76890 AS AnnualSalary
    UNION ALL
    SELECT '372221021' AS SSN, 'Sally' AS EmpName, NULL AS Address, NULL AS AnnualSalary
    UNION ALL
    SELECT '232911021' AS SSN, 'Tom' AS EmpName, '' AS Address, 100234 AS AnnualSalary
),
New_Data
AS
(
    SELECT '000221021' AS SSN, 'Mike (Changed)' AS EmpName, NULL AS Address, 61211 AS AnnualSalary
    UNION ALL
    SELECT '872221021' AS SSN, 'Brad' AS EmpName, '1 32nd Street' AS Address, 45321 AS AnnualSalary
    UNION ALL
    SELECT '092221021' AS SSN, 'Alex' AS EmpName, '601 Broadway' AS Address, 76890 AS AnnualSalary
    UNION ALL
    SELECT '372221021' AS SSN, 'Sally' AS EmpName, NULL AS Address, NULL AS AnnualSalary
    UNION ALL
    SELECT '232911021' AS SSN, 'Tom' AS EmpName, '' AS Address, 100234 AS AnnualSalary
    UNION ALL
    SELECT '99921021' AS SSN, 'Brandy (Added)' AS EmpName, '45 E.Lock Street' AS Address, 80394 AS AnnualSalary
)
SELECT 'Changed or Removed from BEFORE' AS Reason, a.*
FROM
(
    SELECT * FROM Old_Data
    EXCEPT
    SELECT * FROM New_Data
) a
UNION ALL
SELECT 'Changed or Added in AFTER' AS Reason, b.*
FROM
(
    SELECT * FROM New_Data
    EXCEPT
    SELECT * FROM Old_Data
) b

Results – with BEFORE/AFTER analogy

The following is the result of the SQL (See the first column)

REASON SSN EMPNAME ADDRESS ANNUALSALARY
Changed or Removed from BEFORE 221021 Mike 61211
Changed or Removed from BEFORE 111221021 Ed 123 Main Street
Changed or Removed from BEFORE 512221021 Will 45 E.Lock Street 80394
Changed or Added in AFTER 221021 Mike (Changed) 61211
Changed or Added in AFTER 99921021 Brandy (Added) 45 E.Lock Street 80394

…But wait, it could be done better

As I reviewed the post, I thought that it could be done better by exactly identifying INSERTS/UPDATES/DELETES instead of being vague about it and saying “Changed or Removed or Added”. So this is the new code for SQL Server.

WITH Old_Data
AS
(
    SELECT '000221021' AS SSN, 'Mike' AS EmpName, NULL AS Address, 61211 AS AnnualSalary
    UNION ALL
    SELECT '111221021' AS SSN, 'Ed' AS EmpName, '123 Main Street' AS Address, NULL AS AnnualSalary
    UNION ALL
    SELECT '512221021' AS SSN, 'Will' AS EmpName, '45 E.Lock Street' AS Address, 80394 AS AnnualSalary
    UNION ALL
    SELECT '872221021' AS SSN, 'Brad' AS EmpName, '1 32nd Street' AS Address, 45321 AS AnnualSalary
    UNION ALL
    SELECT '092221021' AS SSN, 'Alex' AS EmpName, '601 Broadway' AS Address, 76890 AS AnnualSalary
    UNION ALL
    SELECT '372221021' AS SSN, 'Sally' AS EmpName, NULL AS Address, NULL AS AnnualSalary
    UNION ALL
    SELECT '232911021' AS SSN, 'Tom' AS EmpName, '' AS Address, 100234 AS AnnualSalary
),
New_Data
AS
(
    SELECT '000221021' AS SSN, 'Mike (Changed)' AS EmpName, NULL AS Address, 61211 AS AnnualSalary
    UNION ALL
    SELECT '872221021' AS SSN, 'Brad' AS EmpName, '1 32nd Street' AS Address, 45321 AS AnnualSalary
    UNION ALL
    SELECT '092221021' AS SSN, 'Alex' AS EmpName, '601 Broadway' AS Address, 76890 AS AnnualSalary
    UNION ALL
    SELECT '372221021' AS SSN, 'Sally' AS EmpName, NULL AS Address, NULL AS AnnualSalary
    UNION ALL
    SELECT '232911021' AS SSN, 'Tom' AS EmpName, '' AS Address, 100234 AS AnnualSalary
    UNION ALL
    SELECT '99921021' AS SSN, 'Brandy (Added)' AS EmpName, '45 E.Lock Street' AS Address, 80394 AS AnnualSalary
)
------ DELETES -----------
SELECT 'Removed from BEFORE' AS Reason, Old_Data.*
FROM Old_Data
WHERE NOT EXISTS
	(SELECT 1
	FROM New_Data
	WHERE Old_Data.SSN = New_Data.SSN)
UNION ALL
------ INSERTS -----------
SELECT 'Added in AFTER' AS Reason, New_Data.*
FROM New_Data
WHERE NOT EXISTS
	(SELECT 1
	FROM Old_Data
	WHERE Old_Data.SSN = New_Data.SSN)
UNION ALL
------ UPDATES -----------
SELECT 'Changed' AS Reason, a.*
FROM
(
    SELECT * FROM Old_Data
    EXCEPT
    SELECT * FROM New_Data
) a
WHERE EXISTS
	(SELECT 1
	FROM Old_Data
		INNER JOIN New_Data
			ON Old_Data.SSN = New_Data.SSN
	)

…..better results!

Notice the first column, the results are much better as we originally intended

Reason SSN EmpName Address AnnualSalary
Removed from BEFORE 111221021 Ed 123 Main Street NULL
Removed from BEFORE 512221021 Will 45 E.Lock Street 80394
Added in AFTER 99921021 Brandy (Added) 45 E.Lock Street 80394
Changed 221021 Mike NULL 61211
Changed 111221021 Ed 123 Main Street NULL
Changed 512221021 Will 45 E.Lock Street 80394

Although I have shown it with sample data generated on the fly to illustrate the simple usage, you can just as easily apply this to two tables or two sets of data within a table.

How to find column level differences?

For a more sophisticated comparison, you could also take the above results and then do a self-join on the natural key and compare each column to display old and new values side by side. I would not type that SQL but rather “generate” it using meta-data in the system views.

In SQL Server, you could also consider using CHECKSUM or BINARY_CHECKSUM to get a single number that represents the whole row. If the number is different then the values are different. You could also do CHECKSUM or BINARY_CHECKSUM differences at the column level to find column level differences.

Generate INSERT/UPDATE/DELETE statements based on data comparison

Also, please check out this post that I did to show how to generate INSERT/UDPATE/DELETE’s using a free SQL Server tool called TableDiff.

Please feel free to leave your comments if you found other interesting ways to do this type of comparisons.

Advertisements

2 thoughts on “Oracle (or) SQL Server: Find Row Level Data Differences Using MINUS/EXCEPT – 100 Level

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s