A blocker session showed STATUS=KILLED in V$SESSION. The application team was told the blocker was dead and the locks would clear shortly. They waited 40 minutes. The locks were still held. PMON was rolling back 2GB of uncommitted work from a crashed session — and until the rollback completed, every lock that session held remained active regardless of what STATUS showed.
The Alert
Application: order processing suspended. V$SESSION showed a session with STATUS=KILLED blocking 34 application sessions. The natural response: the session is dead, PMON will clean it up, wait. But 40 minutes later, the lock was still held and the application was still blocked.
First Hypothesis: Wait for PMON
Oracle's PMON process detects dead sessions and rolls back their uncommitted work, releasing locks. For small transactions this takes seconds. For a 2GB uncommitted transaction, it takes much longer. We needed to find out how much rollback remained.
-- Find the killed session and its recovery status: SELECT s.sid, s.serial#, s.status, s.username, t.used_ublk * 8192 / 1024 / 1024 AS undo_used_mb, t.used_urec AS undo_records_used FROM v$session s JOIN v$transaction t ON t.addr = s.taddr WHERE s.status = 'KILLED'; -- Monitor rollback progress every 30 seconds: -- used_ublk should decrease as PMON rolls back -- If used_ublk is decreasing: wait. If stable: escalate.
used_ublk: 256,000 blocks — 2GB of undo. It was decreasing, but slowly. At the current rate, complete rollback would take 85 more minutes. We could wait, or we could OS-kill the backend process to force immediate rollback acceleration.
Incident Timeline
| Time | Event |
|---|---|
| 14:22 | Application session crashes at OS level. 2GB uncommitted work. |
| 14:22 | PMON marks session KILLED. Begins rolling back. |
| 14:22-15:42 | 34 sessions blocked. Application order processing suspended. |
| 15:02 | Investigation begins. used_ublk monitored. 85 minutes remaining estimated. |
| 15:10 | Decision: kill the OS process to release locks sooner |
| 15:11 | OS-level kill of Oracle shadow process. Rollback accelerates. |
| 15:14 | Rollback completes. Locks released. 34 sessions unblock. |
Root Cause
Oracle maintains transactional integrity even for killed sessions. A session with STATUS=KILLED still holds all its locks until PMON completes the rollback of its uncommitted work. A 2GB transaction takes proportionally long to roll back. STATUS=KILLED does not mean the locks are released — it means Oracle has begun the cleanup process.
The Fix
-- Find the OS process ID for the killed session: SELECT s.sid, s.serial#, s.status, p.spid AS os_process_id FROM v$session s JOIN v$process p ON p.addr = s.paddr WHERE s.status = 'KILLED'; -- On Linux, as oracle OS user: -- kill -9 [spid] -- Verify rollback completed: SELECT COUNT(*) FROM v$session WHERE status = 'KILLED'; -- Should return 0
Prevention
idle_time and connect_time profiles are now applied to application user accounts — sessions that become idle for more than 30 minutes in an open transaction are killed automatically. The transaction size limit for the application user was also reviewed — any transaction over 100MB is flagged for architectural review.