Oracle – LAG & LEAD – Usage Examples + Interesting Use-case

LAG and LEAD are a couple of analytic functions that come in handy in some interesting situations. I am not really sure why I wrote this post but at least it gives you a a few queries to play with LEAD and LAG. I am not completely happy with how this post turned out. Nothing is wrong but just not very comprehensible!

Basically, LAG is used to access information from prior rows and make that available to current row and vice versa for LEAD with the assumption that the rows are virtually sorted by a certain order.

Create a table with dummy data:

CREATE TABLE Emp
AS
SELECT * FROM
(
    SELECT 'Employee 1' AS Employee, TO_CHAR(NULL) AS Manager, 10000 AS Salary FROM DUAL
    UNION ALL
    SELECT 'Employee 1.1' AS Employee, 'Employee 1' AS Manager, 9000 AS Salary FROM DUAL
    UNION ALL
    SELECT 'Employee 1.2' AS Employee, 'Employee 1' AS Manager, 9000 AS Salary FROM DUAL
    UNION ALL
    SELECT 'Employee 1.3' AS Employee, 'Employee 1' AS Manager, 8000 AS Salary FROM DUAL
    UNION ALL
    SELECT 'Employee 1.2.1' AS Employee, 'Employee 1.2' AS Manager, 7000 AS Salary FROM DUAL
    UNION ALL
    SELECT 'Employee 1.2.2' AS Employee, 'Employee 1.2' AS Manager, 6000 AS Salary FROM DUAL
    UNION ALL
    SELECT 'Employee 1.2.1.1' AS Employee, 'Employee 1.2.1' AS Manager, 5000 AS Salary FROM DUAL
    UNION ALL
    SELECT 'Employee 1.2.1.2' AS Employee, 'Employee 1.2.1' AS Manager, 4000 AS Salary FROM DUAL
    UNION ALL
    SELECT 'Employee 1.2.1.3' AS Employee, 'Employee 1.2.1' AS Manager, 4000 AS Salary FROM DUAL
);

This is the data:

SELECT *
FROM Emp;
EMPLOYEE MANAGER SALARY
Employee 1 10000
Employee 1.1 Employee 1 9000
Employee 1.2 Employee 1 9000
Employee 1.3 Employee 1 8000
Employee 1.2.1 Employee 1.2 7000
Employee 1.2.2 Employee 1.2 6000
Employee 1.2.1.1 Employee 1.2.1 5000
Employee 1.2.1.2 Employee 1.2.1 4000
Employee 1.2.1.3 Employee 1.2.1 4000

Here is a sample query that uses LAG and LEAD to access information from previous and next rows by a certain offset (1 in this case)

SELECT
    e.*,
    --When all rows are ordered by employee, get the manager in previous row
    LAG(manager, 1, null) OVER (ORDER BY employee) prev_row_manager,
    --When all rows are ordered by employee, get the manager in next row
    LEAD(manager, 1, null) OVER (ORDER BY employee) next_row_manager,
    --When all rows are ordered by employee, get the salary in previous row
    LAG(salary, 1, null) OVER (ORDER BY employee) prev_row_emp_sal,
    --When all rows are ordered by employee, get the salary in next row
    LEAD(salary, 1, null) OVER (ORDER BY employee) next_row_emp_sal,
    --When all rows are ordered by salary, get the salary in previous row
    LAG(salary, 1, null) OVER (ORDER BY salary) prev_smallest_sal,
    --When all rows are ordered by salary, get the salary in next row
    LEAD(salary, 1, null) OVER (ORDER BY salary) next_biggest_sal
FROM Emp e
ORDER BY employee

Results ordered by Employee (first column)

EMPLOYEE MANAGER SALARY PREV_ROW_MANAGER NEXT_ROW_MANAGER PREV_ROW_EMP_SAL NEXT_ROW_EMP_SAL PREV_SMALLEST_SAL NEXT_BIGGEST_SAL
Employee 1 10000 Employee 1 9000 9000
Employee 1.1 Employee 1 9000 Employee 1 10000 9000 8000 9000
Employee 1.2 Employee 1 9000 Employee 1 Employee 1.2 9000 7000 9000 10000
Employee 1.2.1 Employee 1.2 7000 Employee 1 Employee 1.2.1 9000 5000 6000 8000
Employee 1.2.1.1 Employee 1.2.1 5000 Employee 1.2 Employee 1.2.1 7000 4000 4000 6000
Employee 1.2.1.2 Employee 1.2.1 4000 Employee 1.2.1 Employee 1.2.1 5000 4000 4000
Employee 1.2.1.3 Employee 1.2.1 4000 Employee 1.2.1 Employee 1.2 4000 6000 4000 5000
Employee 1.2.2 Employee 1.2 6000 Employee 1.2.1 Employee 1 4000 8000 5000 7000
Employee 1.3 Employee 1 8000 Employee 1.2 6000 7000 9000

The above query does not have anything meaningful. Basically, after ordering the rows by one or more columns, get the value from previous or next row (can offset by more than 1 row in each direction) is what the above SQL does. Please pay close attention to the comments in the SQL. The first few columns order by “employee” whereas the last two columns order by “salary”.  So, although the results may not look right at first look, if this set of data was ordered by salary column they will look right. In fact, all that was done below is to sort the above results by the “salary” column (third column)

EMPLOYEE MANAGER SALARY PREV_ROW_MANAGER NEXT_ROW_MANAGER PREV_ROW_EMP_SAL NEXT_ROW_EMP_SAL PREV_SMALLEST_SAL NEXT_BIGGEST_SAL
Employee 1.2.1.2 Employee 1.2.1 4000 Employee 1.2.1 Employee 1.2.1 5000 4000 4000
Employee 1.2.1.3 Employee 1.2.1 4000 Employee 1.2.1 Employee 1.2 4000 6000 4000 5000
Employee 1.2.1.1 Employee 1.2.1 5000 Employee 1.2 Employee 1.2.1 7000 4000 4000 6000
Employee 1.2.2 Employee 1.2 6000 Employee 1.2.1 Employee 1 4000 8000 5000 7000
Employee 1.2.1 Employee 1.2 7000 Employee 1 Employee 1.2.1 9000 5000 6000 8000
Employee 1.3 Employee 1 8000 Employee 1.2 6000 7000 9000
Employee 1.1 Employee 1 9000 Employee 1 10000 9000 8000 9000
Employee 1.2 Employee 1 9000 Employee 1 Employee 1.2 9000 7000 9000 10000
Employee 1 10000 Employee 1 9000 9000

We could have easily introduced an additional layer to say “previous highest salary within department” with a PARTITION BY clause.

In fact, let us try to get the previous and next biggest salaries for each employee but only within the group of people reporting to the same manager:

SELECT
    e.*,
    --When all rows are ordered by salary, get the salary in previous row
    LAG(salary, 1, null) OVER (PARTITION BY manager ORDER BY salary) prev_smallest_sal_this_mgr,
    --When all rows are ordered by salary, get the salary in next row
    LEAD(salary, 1, null) OVER (PARTITION BY manager ORDER BY salary) next_biggest_sal_this_mgr
FROM Emp e
EMPLOYEE MANAGER SALARY PREV_SMALLEST_SAL_THIS_MGR NEXT_BIGGEST_SAL_THIS_MGR
Employee 1.3 Employee 1 8000 9000
Employee 1.1 Employee 1 9000 8000 9000
Employee 1.2 Employee 1 9000 9000
Employee 1.2.2 Employee 1.2 6000 7000
Employee 1.2.1 Employee 1.2 7000 6000
Employee 1.2.1.2 Employee 1.2.1 4000 4000
Employee 1.2.1.3 Employee 1.2.1 4000 4000 5000
Employee 1.2.1.1 Employee 1.2.1 5000 4000
Employee 1 10000

The interesting use-case – Employees whose managers have more than one report

Now, on to the interesting(?) use of LAG and LEAD :

If we were asked to “list all employees whose managers have more than one report”,  here is a way to do it. Please re-read carefully, we are not listing all managers with more than one report but all employees whose managers have more than one reports.

SELECT *
FROM Emp
WHERE manager IN
    (
    --Get the list of managers having more than one associated employee
    SELECT manager
    FROM Emp
    GROUP BY manager
    HAVING COUNT(employee) > 1
    )

The above is simple enough but let us take a look at the plan:

Plan
SELECT STATEMENT CHOOSE
8 MERGE JOIN
2 SORT JOIN
1 TABLE ACCESS FULL EMP
7 SORT JOIN
6 VIEW SYS.VW_NSO_1
5 FILTER
4 SORT GROUP BY
3 TABLE ACCESS FULL EMP

It has to full table scans followed by sorts and then a merge. If this were a million row table, this is very expensive.

Knowing what we know about LAG and LEAD, we can say, “after ordering by manager, if the manager value in current row equals that from previous row or the next row, the manager has more than one report”.

SELECT *
FROM (
        SELECT
            e.*,
            CASE WHEN manager = LEAD(manager, 1, 0) OVER (ORDER BY manager)
                        OR manager = LAG(manager, 1, 0) OVER (ORDER BY manager)
                    THEN 'Y'
                    ELSE 'N'
                END Has_Count_GT_1
        FROM Emp e
    )
WHERE Has_Count_GT_1 = 'Y'

..and the plan is much simpler (and certainly faster)

Plan
SELECT STATEMENT CHOOSE
3 VIEW SYSTEM.
2 WINDOW SORT
1 TABLE ACCESS FULL EMP

The example may be a bad one but there are very practical applications of the above technique whenever there is a need to get the detail level PK values (like Emp above) at the row level based on qualification criteria dictated by other non-unique columns (like manager).

In general, Oracle’s “analytic functions” (of which LEAD and LAG are a very small part) offer an easier and more efficient way answer complex questions that deal with a window of data within the broader resultset.

Something like this would be a nightmare query using normal methods but with analytic functions, it will not be too difficult.

  • Mid-level managers report – Get the list of all managers making between $ 125,000 and $150,000 with more than 2 levels of managers above them and have at least 10 employees reporting to them but no more than 22 employees
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s