Oracle System & Session Wait Big Picture Analysis- Using RATIO_TO_REPORT Function

Numbers and meaning

it comes to numbers, the easiest way to get a point across to majority of the people is to express something in terms of percentage.

Let us consider an example where a query produces the “waits” Oracle experienced overall in terms of “Number of waits” and the “Time waited”.

SELECT wait_class,
   total_waits,
   time_waited
FROM v$system_wait_class
WHERE wait_class != 'Idle'

This produces:

WAIT_CLASS TOTAL_WAITS TIME_WAITED
Other 591004 11573
Application 38978 1341
Configuration 386 4208
Concurrency 13205 246683
Commit 555063 34619
Network 5761591 3315
User I/O 1444586 478017
System I/O 1459973 259161
Scheduler 2229 3709

Results as percentages please (the hard way)

Great. We have the numbers but they do not express the weight of each compared to the overall number.

Wouldn’t it be nice to look at the same table like this? Suddenly, every single number in the table is starting to make sense.

WAIT_CLASS TOTAL_WAITS_PCT TIME_WAITED_PCT
User I/O 14.63 % 45.84 %
System I/O 14.80 % 24.86 %
Concurrency 0.13 % 23.66 %
Commit 5.62 % 3.32 %
Other 6.00 % 1.11 %
Configuration 0 % 0.40 %
Scheduler 0.02 % 0.36 %
Network 58.40 % 0.32 %
Application 0.39 % 0.13 %

Now, it is quite obvious to me that “User I/O” is where almost half the total time is spent waiting.

Basic math skill with some query manipulation of the original query produced the above table.

SELECT  wait_class,
        ROUND(100 * (total_waits / sum_waits),2) total_waits_pct,
        ROUND(100 * (time_waited / sum_time),2) time_waited_pct
FROM
(   SELECT wait_class,
        total_waits,
        time_waited
    FROM    v$system_wait_class
    WHERE   wait_class != 'Idle'
),
(
    SELECT  SUM(total_waits) sum_waits,
        SUM(time_waited) sum_time
    FROM    v$system_wait_class
    WHERE   wait_class != 'Idle'
)
ORDER BY 3 DESC

Notice how awkward the query became. We repeated the same query twice. There is room for simple mistakes when we copy and paste the formula to calculate percentage for the different columns. In a larger query, one can also easily miss the proper joins. The above does not have any joins because it was structured that way.

Percentages the easy way using RATIO_TO_REPORT function:

The native RATIO_TO_REPORT Oracle function helps us get the percentage and yet keep it simple. The same query above can be simplified using RATIO_TO_REPORT function to produce the identical result.

SELECT  wait_class,
        ROUND(100* RATIO_TO_REPORT(total_waits) OVER (),2) total_waits_pct,
        ROUND(100* RATIO_TO_REPORT(time_waited) OVER (),2) time_waited_pct
FROM
(   SELECT wait_class,
        total_waits,
        time_waited
    FROM    v$system_wait_class
    WHERE   wait_class != 'Idle'
)
ORDER BY 3 DESC
WAIT_CLASS TOTAL_WAITS_PCT TIME_WAITED_PCT
User I/O 36.63 48.42
Network 25.88 39.22
System I/O 7.02 4.66
Scheduler 1.53 3.15
Other 28.04 2.61
Concurrency 0.73 1.75
Commit 0.17 0.18
Administrative 0 0.01
Configuration 0 0
Application 0 0

The above illustration is a simple use of RATIO_TO_REPORT. Let us see how we can use the OVER() clause to do percentages over subsets of data.

Analyzing System waits

Let us explore further to see how we can get a better sense of the overall system wait picture.

Each of the above wait_classes have sub-categories.

SELECT DISTINCT wait_class, name
FROM v$event_name
ORDER BY wait_class, name

This is a sample of the rows returned:

WAIT_CLASS NAME
Administrative ASM COD rollback operation completion
Administrative ASM mount : wait for heartbeat
Administrative Backup: MML command to channel
Administrative Backup: MML commit backup piece
Administrative Backup: MML create a backup piece


User I/O ASM Staleness File I/O
User I/O Archive Manager file transfer I/O
User I/O BFILE read
User I/O DG Broker configuration file I/O
User I/O Data file init write


Application External Procedure call
Application External Procedure initial connection
Application OLAP DML Sleep
Application SQL*Net break/reset to client
Application SQL*Net break/reset to dblink
Application Streams capture: filter callback waiting for ruleset


Network ARCH wait for flow-control
Network ARCH wait for net re-connect
Network ARCH wait for netserver detach
Network ARCH wait for netserver init 2
Network ARCH wait for netserver start
Network ARCH wait on ATTACH
Network ARCH wait on DETACH


System I/O Archiver slave I/O
System I/O DBWR slave I/O
System I/O LGWR slave I/O
System I/O Log archive I/O
System I/O Network file transfer
System I/O RFS random i/o


What would our ideal output look like?

Would it not be nice if we could get not only the overall percentages but also percentages within each WAIT_CLASS grouping as shown in the last column?

** This is only an illustration. Real wait numbers will not look like these.

WAIT_CLASS NAME % Time Waited Within WAIT_CLASS % Time Waited Overall
Administrative ASM COD rollback operation completion 5 % 1.5 %
Administrative ASM mount : wait for heartbeat 20 % 3 %
Administrative Backup: MML command to channel 35 % 3.5 %
Administrative Backup: MML commit backup piece 20 % 9 %
Administrative Backup: MML create a backup piece 20 % 1 %
100 % 18 %
User I/O ASM Staleness File I/O 70 % 12 %
User I/O Archive Manager file transfer I/O 0 % 0 %
User I/O BFILE read 0 % 0 %
User I/O DG Broker configuration file I/O 0 % 0 %
User I/O Data file init write 30 % 1 %
100 % 13 %
 

…. More WAIT_CLASS’es

 

 
  100 %

System waits for wait_calss “User I/O”

We know that User I/O was the problem from the overall system level wait picture and we need to get a wait breakup of the sub-categories within User I/O.

Let us get the breakup for just User I/O across sessions:

SELECT wait_class, event, SUM(total_waits) total_waits, SUM(total_timeouts) total_timeouts, SUM(time_waited) time_waited
FROM v$system_event a
WHERE wait_class = 'User I/O'
GROUP BY wait_class, event
ORDER BY time_waited DESC
WAIT_CLASS EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED
User I/O db file sequential read 3667 0 9473
User I/O direct path read 1690 0 945
User I/O db file scattered read 389 0 834
User I/O db file parallel read 56 0 121
User I/O Disk file operations I/O 1158 0 73
User I/O direct path write 393 0 27
User I/O read by other session 2 0 1

Now we know that within User I/O, majority of the time is spent on “db file sequential read”

Breakup waits for wait_calss “User I/O” and event “db file sequential read”

We could now fetch the User I/O breakup by session now since we know that it is the problem area – more specifically the “db file sequential read” event within the “User I/O” wait class.

SELECT sid, wait_class, event, total_waits, total_timeouts, time_waited
FROM v$system_event a
WHERE wait_class = 'User I/O'
    AND event = 'db file sequential read'
ORDER BY time_waited desc
SID WAIT_CLASS EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED
10 User I/O db file sequential read 1807 0 4429
390 User I/O db file sequential read 21 0 1178
403 User I/O db file sequential read 167 0 1029
60 User I/O db file sequential read 142 0 558
421 User I/O db file sequential read 718 0 555
456 User I/O db file sequential read 49 0 418
414 User I/O db file sequential read 39 0 334
11 User I/O db file sequential read 420 0 299
17 User I/O db file sequential read 244 0 265
429 User I/O db file sequential read 22 0 160
404 User I/O db file sequential read 18 0 158
397 User I/O db file sequential read 2 0 75
23 User I/O db file sequential read 6 0 9
52 User I/O db file sequential read 1 0 5
5 User I/O db file sequential read 11 0 1

SID 10  has waited the most on wait_class “User I/O” and event “db file sequential read”.

Wait state – Big picture using RATIO_TO_REPORT

Without RATIO_TO_REPORT, things are still hard to interpret.

We did a lot of the grunt work to get what we needed to get. Let us try to get the overall wait picture using RATIO_TO_REPORT and the PARTITION BY clause which GROUP’s rows by the columns specified. The grouping in turn is utilized by RATIO_TO_REPORT for its reporting.

SELECT wait_class, event,
    --------- WAITS ---------
    SUM(total_waits) total_waits,
    ROUND(100* RATIO_TO_REPORT(SUM(total_waits)) OVER (PARTITION BY wait_class),2) pct_total_waits_WC,
    ROUND(100* RATIO_TO_REPORT(SUM(total_waits)) OVER (),2) pct_total_waits_OVERALL,
    --------- TIMEOUTS ---------
    SUM(total_timeouts) total_timeouts,
    ROUND(100* RATIO_TO_REPORT(SUM(total_timeouts)) OVER (PARTITION BY wait_class),2) pct_total_timeouts_WC,
    ROUND(100* RATIO_TO_REPORT(SUM(total_timeouts)) OVER (),2) pct_total_timeouts_OVERALL,
    --------- TIME WAITED ---------
    SUM(time_waited) time_waited,
    ROUND(100* RATIO_TO_REPORT(SUM(time_waited)) OVER (PARTITION BY wait_class),2) pct_time_waited_WC,
    ROUND(100* RATIO_TO_REPORT(SUM(time_waited)) OVER (),2) pct_time_waited_OVERALL
FROM v$system_event a
WHERE wait_class != 'Idle'
GROUP BY wait_class, event
ORDER BY wait_class, event

Results

The results of the above query are too wide to display. So, I will display them in three parts with percentages within the WAIT_CLASS for each row and percentages for row for all WAIT_CLASSes

  1. WAITS
  2. TIMEOUTS
  3. TIME WAITED

Results – Part 1 (WAITS)

WAIT_CLASS EVENT TOTAL_WAITS PCT_TOTAL_WAITS_WC

(%’s within WAIT_CLASS)

PCT_TOTAL_WAITS_OVERALL

(%’s overall)

Administrative JS coord start wait 1 100 0
Application SQL*Net break/reset to client 311 99.68 0.01
Application enq: TX – row lock contention 1 0.32 0
Commit log file sync 100 100 0
Concurrency buffer busy waits 35 0.33 0
Concurrency cursor: pin S wait on X 3 0.03 0
Concurrency latch: cache buffers chains 207 1.95 0.01
Concurrency latch: row cache objects 3 0.03 0
Concurrency latch: shared pool 37 0.35 0
Concurrency library cache: mutex X 1 0.01 0
Concurrency os thread startup 10321 97.3 0.33
Configuration latch: redo writing 21 100 0
Network SQL*Net message to client 14425 72.01 0.47
Network SQL*Net more data from client 2382 11.89 0.08
Network SQL*Net more data to client 3224 16.1 0.1
Other events in waitclass Other 18170 100 0.59
System I/O control file parallel write 164263 5.4 5.3
System I/O control file sequential read 801821 26.37 25.89
System I/O db file parallel write 244779 8.05 7.9
System I/O log file parallel write 1829321 60.16 59.06
System I/O log file sequential read 307 0.01 0.01
System I/O log file single write 307 0.01 0.01
User I/O Disk file operations I/O 1158 15.74 0.04
User I/O db file parallel read 56 0.76 0
User I/O db file scattered read 389 5.29 0.01
User I/O db file sequential read 3667 49.86 0.12
User I/O direct path read 1690 22.98 0.05
User I/O direct path write 393 5.34 0.01
User I/O read by other session 2 0.03 0

Results – Part 2 (TIMEOUTS)

WAIT_CLASS EVENT TOTAL_TIMEOUTS PCT_TOTAL_TIMEOUTS_WC

(%’s within WAIT_CLASS)

PCT_TOTAL_TIMEOUTS_OVERALL

(%’s overall)

Administrative JS coord start wait 1 100 0.29
Application SQL*Net break/reset to client 0 NULL 0
Application enq: TX – row lock contention 0 NULL 0
Commit log file sync 0 NULL 0
Concurrency buffer busy waits 0 NULL 0
Concurrency cursor: pin S wait on X 0 NULL 0
Concurrency latch: cache buffers chains 0 NULL 0
Concurrency latch: row cache objects 0 NULL 0
Concurrency latch: shared pool 0 NULL 0
Concurrency library cache: mutex X 0 NULL 0
Concurrency os thread startup 0 NULL 0
Configuration latch: redo writing 0 NULL 0
Network SQL*Net message to client 0 NULL 0
Network SQL*Net more data from client 0 NULL 0
Network SQL*Net more data to client 0 NULL 0
Other events in waitclass Other 349 100 99.71
System I/O control file parallel write 0 NULL 0
System I/O control file sequential read 0 NULL 0
System I/O db file parallel write 0 NULL 0
System I/O log file parallel write 0 NULL 0
System I/O log file sequential read 0 NULL 0
System I/O log file single write 0 NULL 0
User I/O Disk file operations I/O 0 NULL 0
User I/O db file parallel read 0 NULL 0
User I/O db file scattered read 0 NULL 0
User I/O db file sequential read 0 NULL 0
User I/O direct path read 0 NULL 0
User I/O direct path write 0 NULL 0
User I/O read by other session 0 NULL 0

Results – Part 3 (TIME WAITED)

WAIT_CLASS EVENT TIME_WAITED PCT_TIME_WAITED_WC

(%’s within WAIT_CLASS)

PCT_TIME_WAITED_OVERALL

(%’s overall)

Administrative JS coord start wait 50 100 0.01
Application SQL*Net break/reset to client 23 0.01 0
Application enq: TX – row lock contention 259358 99.99 31.99
Commit log file sync 8 100 0
Concurrency buffer busy waits 0 0 0
Concurrency cursor: pin S wait on X 4 0.01 0
Concurrency latch: cache buffers chains 176 0.26 0.02
Concurrency latch: row cache objects 0 0 0
Concurrency latch: shared pool 0 0 0
Concurrency library cache: mutex X 0 0 0
Concurrency os thread startup 66943 99.73 8.26
Configuration latch: redo writing 7 100 0
Network SQL*Net message to client 2 16.67 0
Network SQL*Net more data from client 2 16.67 0
Network SQL*Net more data to client 8 66.67 0
Other events in waitclass Other 2886 100 0.36
System I/O control file parallel write 34088 7.26 4.2
System I/O control file sequential read 76612 16.31 9.45
System I/O db file parallel write 242823 51.69 29.95
System I/O log file parallel write 116165 24.73 14.33
System I/O log file sequential read 40 0.01 0
System I/O log file single write 28 0.01 0
User I/O Disk file operations I/O 73 0.64 0.01
User I/O db file parallel read 121 1.05 0.01
User I/O db file scattered read 834 7.27 0.1
User I/O db file sequential read 9473 82.56 1.17
User I/O direct path read 945 8.24 0.12
User I/O direct path write 27 0.24 0
User I/O read by other session 1 0.01 0

For the three things we are focusing on – Total_Waits, Total_Timeouts & Time_Waited, we were able to get the percentages

  • Overall – across all wait_classes
  • Within wait_class

To research all we have to do now is focus on the biggest number on the “Overall” columns.  One can imagine how complex the query would be without the RATIO_TO_REPORT function.

Analyzing Session waits

Then, we can get a session level picture just like the above and focus again on the “Overall” columns.

To get the session level picture, the query needs a few slight changes (do include a SID condition for a specific session)

SELECT SID, wait_class, event,
    total_waits,
    ROUND(100* RATIO_TO_REPORT(total_waits) OVER (PARTITION BY wait_class),2) pct_total_waits_WC,
    ROUND(100* RATIO_TO_REPORT(total_waits) OVER (),2) pct_total_waits_OVERALL,
    total_timeouts,
    ROUND(100* RATIO_TO_REPORT(total_timeouts) OVER (PARTITION BY wait_class),2) pct_total_timeouts_WC,
    ROUND(100* RATIO_TO_REPORT(total_timeouts) OVER (),2) pct_total_timeouts_OVERALL,
    time_waited,
    ROUND(100* RATIO_TO_REPORT(time_waited) OVER (PARTITION BY wait_class),2) pct_time_waited_WC,
    ROUND(100* RATIO_TO_REPORT(time_waited) OVER (),2) pct_time_waited_OVERALL
FROM v$session_event a
WHERE wait_class != 'Idle'
ORDER BY SID, wait_class, event

It is also easy to get additional percentage numbers within each session. We just have to add SID to the PARTITION BY or limit to just one session in the WHERE clause as noted before.

Summary

In summary, all the “PARTITION BY” does is to GROUP rows for the purposes of doing a RATIO_TO_REPORT.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s