Yes. Yesterday was my last day as an application/database Architect. I still love to architect solutions but will not be doing that (only enabling it instead) officially at my job anymore. I want to learn more and so I chose to be a DBA for Oracle and SQL Server as my primary job function.
You might be inclined to ask “Why would anybody do that?”. Here is why. As a DBA, a developer comes to you and says “I have tried everything and ‘it‘ just hangs”. To him/her, he/she as no clue and he/she is looking up to you as his/her last resort. At that point, you can say, I know exactly what is happening or that he/she is on his/her own. I do not believe in even implying the latter. There is excitement in solving problems that others are unwilling, uninterested or unable to solve. To me, I am also benefiting because I am learning new things along the way. Not only will I be solving problems but also, more importantly, the learning will enable me to architect better solutions at a whole new level!
Anyway, we have been having this issue with “latch free” and “buffer busy waits” with our Oracle 9i database (yes, 9i!). Things would run fine and suddenly just hang (relatively speaking) . The job that was “hanging” had a primary wait event: ‘buffer busy waits’. To get to the bottom of this I saw if any other session had the same wait
SELECT s.sid, s.sql_hash_value, sw.p1 file#, sw.p2 block#, sw.p3 reason FROM v$session_wait sw, v$session s WHERE sw.event = 'buffer busy waits' AND sw.sid = s.sid
Fortunately, “Yes”. There are a lot sessions with the same wait. Also, I saw a lot of sessions with the same sql_hash_value, the value that any SQL statement across sessions boils down to! I was seeing multiple sessions with the same file# and block# for this specific wait. This immediately tells me that all of these sessions are contending for the same block AKA “hot bock issue”.
Noting down the exact file and block, I can look at what exactly is causing the contention
SELECT owner , segment_name , segment_type FROM dba_extents WHERE file_id = &file# AND &block_id BETWEEN block_id AND block_id + blocks -1;
It could be due to several reasons but knowing the logic behind the problem in question, it turned out that there were multiple sessions running the same exact query that might have resulted in accessing the same information (possibly with different parameters). Mostly this is fine but if you know for sure that your queries across these parallel sessions are supposed to return rows that are exclusive of each other, you are in for a surprise. You believe that your query running across the different sessions with different parameters should return rows that are mutually exclusive…but guess what, the cost optimizer boiled your query to a plan that may access data blocks that may not be needed. This is the situation in our case.
As stupid as it may seem, even adding an extra condition that has no associated index changes the execution plan to alter it to access data may cure it (or) accessing the data differently by re-writing the query could do wonders for you. Quite simply if you just made sure that multiple parallel sessions are not accessing/altering the same data, you might be fine. Look at your explain plan and make sure that the query’s table associated with the file# noted above does not “over shoot” past the data it needs.
Essentially, the wait event “buffer busy wait” means that the data is already in the buffer cache and this session is either waiting to access it and cannot because of some reason or that it needs to bring it to the cache and it cannot, due to contention. Either way, the solution is to make sure that your parallel sessions not interact with the same exact physical piece of data. It could be due to a query where someone says “SELECT * FROM table WHERE non_indexed_column LIKE some_random_value” that does full table scan or it could be more sinister than that on a good looking explain plan on the surface.
….Anyway, as a fellow-DBA, Good Luck!