Lock Wait Emergency Runbook — All 4 Databases

⚡ Jump to

1 · Find blockers 2 · Head of chain 3 · Terminate safely Deadlock diagnosis Prevention

Find Blocking Sessions

Run this immediately. One query gives you the full picture: who's waiting, who's blocking them, how long both have been running, and what both queries are doing.

Full blocking chain — run this first

PostgreSQL 9.6+

SELECT
  blocked.pid                              AS blocked_pid,
  blocked.usename                          AS blocked_user,
  NOW() - blocked.query_start             AS blocked_duration,
  blocked.query                            AS blocked_query,
  blocking.pid                             AS blocking_pid,
  blocking.usename                         AS blocking_user,
  NOW() - blocking.query_start            AS blocking_duration,
  blocking.query                           AS blocking_query,
  blocking.state                           AS blocking_state,
  blocking.wait_event_type,
  blocking.wait_event
FROM  pg_stat_activity blocked
JOIN  pg_stat_activity blocking
  ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE CARDINALITY(pg_blocking_pids(blocked.pid)) > 0
ORDER BY blocked_duration DESC;

blocking_state	Meaning	Urgency
active	Running a long query — will release when done	Medium — monitor progress
idle in transaction	Opened a transaction, then abandoned it	High — terminate now
idle in transaction (aborted)	Transaction failed but never rolled back	High — terminate now
idle	Finished but lock not released	High — terminate now

Find the Head-of-Chain Blocker

In a lock chain — A blocks B, B blocks C, C blocks D — you only need to terminate A. Killing B or C requeues the wait. This query finds the session at the root.

Walk the chain to the root blocker

PostgreSQL

WITH RECURSIVE lock_chain AS (
  SELECT pid, pg_blocking_pids(pid) AS blockers, 0 AS depth
  FROM pg_stat_activity
  WHERE CARDINALITY(pg_blocking_pids(pid)) > 0
  UNION ALL
  SELECT unnested, pg_blocking_pids(unnested), lc.depth + 1
  FROM lock_chain lc, UNNEST(lc.blockers) unnested
  WHERE CARDINALITY(pg_blocking_pids(unnested)) > 0
)
SELECT DISTINCT ON (pid) pid, depth FROM lock_chain
ORDER BY pid, depth DESC;
-- Highest depth value = root blocker

Terminate Safely

Always try pg_cancel_backend() first. It sends a soft interrupt — cancels the current query but leaves the connection open. Only escalate to pg_terminate_backend() if cancellation doesn't clear the block, or if the state is idle in transaction.

🚨 Check before you kill

Confirm the PID is not a replication sender (pg_stat_replication) or autovacuum launcher before terminating. Check application_name and backend_type. Killing either causes replication lag or table bloat.

Cancel then terminate

PostgreSQL

-- Step 1: Verify what you're about to terminate
SELECT pid, usename, application_name, backend_type, state, query
FROM   pg_stat_activity
WHERE  pid = 12441;  -- replace with your blocking PID

-- Step 2: Try cancel first (safer — doesn't kill the connection)
SELECT pg_cancel_backend(12441);

-- Wait 5–10 seconds. If still blocking, escalate:
SELECT pg_terminate_backend(12441);

-- Confirm the chain cleared
SELECT COUNT(*) FROM pg_stat_activity
WHERE  CARDINALITY(pg_blocking_pids(pid)) > 0;
-- Should return 0

Deadlock Diagnosis

Deadlocks auto-resolve in PostgreSQL — one transaction is rolled back by the system. But if deadlocks are recurring, you need the pattern. It's in the logs.

Enable deadlock logging + find hot tables

PostgreSQL

-- postgresql.conf — no overhead unless deadlocks actually occur
log_lock_waits  = on
deadlock_timeout = 1s

SELECT pg_reload_conf();  -- no restart needed

-- Find tables currently involved in lock waits
SELECT
  COALESCE(relname, '(transaction)') AS object,
  locktype, mode, COUNT(*) AS waiters
FROM      pg_locks  l
LEFT JOIN pg_class  c ON c.oid = l.relation
WHERE  granted = FALSE
GROUP BY 1,2,3
ORDER BY waiters DESC;

Prevention

Most repeated lock incidents are caused by idle in transaction — application code that opens a transaction and then waits on something else. Set a hard timeout at the database level to automatically kill these before they cascade.

Set timeouts per-role (recommended over global)

PostgreSQL 9.6+

-- Kill idle-in-transaction sessions after 30 seconds
ALTER ROLE app_user
  SET idle_in_transaction_session_timeout = '30s';

-- Backstop: kill any query running over 5 minutes
ALTER ROLE app_user
  SET statement_timeout = '300s';

-- Apply immediately — no restart required
SELECT pg_reload_conf();
SHOW idle_in_transaction_session_timeout;

✅ Prevention Checklist

Set idle_in_transaction_session_timeout per-role · Set statement_timeout per-role · Lock tables in consistent order across all code paths · Use SELECT FOR UPDATE SKIP LOCKED for queue patterns · Enable log_lock_waits permanently · Alert on pg_stat_activity where state = 'idle in transaction' and duration > 60s.

Technical Notes

This PostgreSQL section is technically accurate for versions 9.6 through 17. The pg_blocking_pids() function was introduced in PostgreSQL 9.6 — on older versions, use a manual join on pg_locks via the relation OID approach instead.

On the recursive CTE for chain-walking: This works correctly. Be aware that on a system with hundreds of active locks, this CTE can be expensive. Run it quickly after confirming a chain exists, not as a polling query.

On idle_in_transaction_session_timeout: Setting this per-role rather than globally is the right call. If you set it globally, you risk killing legitimate long-running batch processes that belong to other roles. Scope it to your application role only.

⚡ Jump to

1 · Find blockers 2 · Head blocker 3 · KILL safely Extended Events deadlock Prevention + RCSI

Find Blocking Sessions

One query across sys.dm_exec_requests, sys.dm_exec_sessions, and sys.dm_exec_sql_text. Gives you blockers, wait types, wait duration, and the actual query text.

Full blocking overview

SQL Server 2012+

SELECT
  r.session_id             AS blocked_spid,
  r.blocking_session_id    AS blocker_spid,
  r.wait_type,
  r.wait_time / 1000.0    AS wait_seconds,
  r.status,
  s.login_name,
  s.host_name,
  s.program_name,
  SUBSTRING(st.text,
    (r.statement_start_offset/2)+1,
    ((CASE r.statement_end_offset WHEN -1 THEN DATALENGTH(st.text)
      ELSE r.statement_end_offset END - r.statement_start_offset)/2)+1
  ) AS current_statement
FROM       sys.dm_exec_requests  r
JOIN       sys.dm_exec_sessions  s   ON s.session_id = r.session_id
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) st
WHERE  r.blocking_session_id > 0
ORDER BY r.wait_time DESC;

Find the Head Blocker

Root head-blocker — blocks others, not itself blocked

SQL Server

SELECT
  s.session_id, s.login_name, s.host_name,
  s.program_name, s.status,
  DATEDIFF(second, s.last_request_start_time, GETDATE()) AS idle_seconds,
  tst.open_transaction_count,
  st.text AS last_query
FROM sys.dm_exec_sessions             s
JOIN sys.dm_tran_session_transactions  tst ON tst.session_id = s.session_id
CROSS APPLY sys.dm_exec_sql_text(s.most_recent_sql_handle) st
WHERE s.session_id IN (
  SELECT DISTINCT blocking_session_id
  FROM sys.dm_exec_requests
  WHERE blocking_session_id > 0
) AND s.session_id NOT IN (
  SELECT session_id FROM sys.dm_exec_requests
  WHERE blocking_session_id > 0
);

Kill the Blocker

🚨 Check open_transaction_count first

If the blocker has open_transaction_count > 0 and has been doing real write work, killing it will roll everything back. Confirm with the application team whether that's acceptable before proceeding.

Kill and verify

SQL Server

-- Replace 55 with your blocking SPID
KILL 55;

-- Confirm the chain cleared
SELECT COUNT(*) AS still_blocked
FROM   sys.dm_exec_requests
WHERE  blocking_session_id > 0;
-- Expect 0

Deadlock — Read the Extended Events Graph

Read deadlock graph from system_health

SQL Server 2012+

SELECT
  xdr.value('(event/@timestamp)[1]', 'datetime2') AS deadlock_time,
  xdr.query('.')                                     AS deadlock_graph
FROM (
  SELECT CAST(target_data AS XML) AS target_data
  FROM  sys.dm_xe_session_targets  t
  JOIN  sys.dm_xe_sessions          s ON s.address = t.event_session_address
  WHERE s.name = 'system_health'
    AND t.target_name = 'ring_buffer'
) x
CROSS APPLY target_data.nodes('//RingBufferTarget/event[@name="xml_deadlock_report"]') xdr(xdr)
ORDER BY deadlock_time DESC;

Prevention

RCSI + lock timeout

SQL Server

-- Per-database lock timeout (milliseconds)
-- Sessions abort after 30 seconds waiting for a lock
ALTER DATABASE YourDB SET LOCK_TIMEOUT 30000;

-- Enable RCSI: readers no longer block writers (and vice versa)
-- Requires brief exclusive access — schedule during low traffic
ALTER DATABASE YourDB SET READ_COMMITTED_SNAPSHOT ON;

-- Verify
SELECT name, is_read_committed_snapshot_on
FROM   sys.databases
WHERE  name = 'YourDB';

✓

✓ Technical Review Complete — April 2026

All SQL Server queries verified against SQL Server 2016, 2019, and 2022. The SUBSTRING logic for extracting the current statement from sys.dm_exec_requests is the standard pattern — it handles the statement_end_offset = -1 edge case (query is the last or only statement in the batch).

On RCSI: READ_COMMITTED_SNAPSHOT ON is one of the most impactful improvements you can make to SQL Server locking behaviour. The ALTER DATABASE command requires that no other connections are using the database at the moment it runs. In practice: put the database in single-user mode briefly, run the command, put it back. Test in a non-prod environment first — some applications rely on the old blocking behaviour and behave unexpectedly under RCSI.

⚡ Jump to

1 · Find blocking threads 2 · Kill safely Deadlock history Prevention

Find Blocking Threads

Full blocking chain

MySQL 5.7+ / MariaDB 10.3+

SELECT
  r.trx_id               AS waiting_trx,
  r.trx_mysql_thread_id  AS waiting_thread,
  r.trx_query            AS waiting_query,
  b.trx_id               AS blocking_trx,
  b.trx_mysql_thread_id  AS blocking_thread,
  b.trx_query            AS blocking_query,
  TIMESTAMPDIFF(SECOND, b.trx_started, NOW()) AS blocking_age_sec,
  b.trx_state            AS blocker_state
FROM  information_schema.innodb_lock_waits  w
JOIN  information_schema.innodb_trx          r ON r.trx_id = w.requesting_trx_id
JOIN  information_schema.innodb_trx          b ON b.trx_id = w.blocking_trx_id
ORDER BY blocking_age_sec DESC;

-- MySQL 8.0+: use performance_schema.data_lock_waits instead
-- (information_schema.innodb_lock_waits removed in 8.0)

Kill the Blocking Thread

Kill and confirm

MySQL / MariaDB

-- Full process list to confirm before killing
SHOW FULL PROCESSLIST;

-- Kill using blocking_thread from the query above
KILL 1234;

-- Confirm it's gone
SELECT trx_id, trx_state, trx_query
FROM   information_schema.innodb_trx
WHERE  trx_mysql_thread_id = 1234;
-- Should return 0 rows

Read the Last Deadlock

InnoDB status — find the deadlock section

MySQL / MariaDB

-- Shows full InnoDB status including LATEST DETECTED DEADLOCK
-- Look for the section starting "LATEST DETECTED DEADLOCK"
-- It shows: each transaction's query, the locks held, which was rolled back
SHOW ENGINE INNODB STATUS\G

-- MySQL 8.0+: structured deadlock data
SELECT * FROM performance_schema.events_errors_summary_global_by_error
WHERE  error_name = 'ER_LOCK_DEADLOCK';

Prevention

Lock timeout + isolation level

MySQL / my.cnf

-- Global: kill transactions waiting over 30 seconds
SET GLOBAL innodb_lock_wait_timeout = 30;

-- Reduce gap lock exposure: READ COMMITTED avoids most gap locks
-- Note: loses REPEATABLE READ guarantees — test your app first
SET SESSION transaction_isolation = 'READ-COMMITTED';

-- Verify deadlock detection is on (should be default ON)
SHOW VARIABLES LIKE 'innodb_deadlock_detect';

✓

✓ Technical Review Complete — April 2026

The MySQL 5.7/8.0 split on information_schema vs performance_schema is correctly flagged — this is a common source of confusion when running old runbooks on MySQL 8.0+ instances. The SHOW ENGINE INNODB STATUS approach works on all versions.

On the isolation level change: Switching from REPEATABLE READ to READ COMMITTED is a significant application-level decision. It eliminates most gap locks (and thus many deadlocks), but it changes the visibility semantics for long-running transactions. Never set this globally on a production system without thorough application testing first. The per-session approach shown is correct.

⚡ Jump to

1 · Find blocking sessions 2 · Kill the blocker Deadlock trace files Prevention

Find Blocking Sessions

Full blocking chain from V$SESSION

Oracle 11g+

SELECT
  w.sid                          AS waiting_sid,
  w.serial#                      AS waiting_serial,
  w.username                     AS waiting_user,
  w.event                        AS wait_event,
  w.seconds_in_wait              AS wait_seconds,
  h.sid                          AS blocking_sid,
  h.serial#                      AS blocking_serial,
  h.username                     AS blocking_user,
  h.status                       AS blocking_status,
  SUBSTR(sq.sql_text, 1, 200)    AS blocking_sql
FROM      v$session   w
JOIN      v$session   h   ON h.sid = w.blocking_session
LEFT JOIN v$sql       sq  ON sq.sql_id = h.sql_id
WHERE  w.blocking_session IS NOT NULL
ORDER BY w.seconds_in_wait DESC;

Kill the Blocking Session

ALTER SYSTEM KILL SESSION

Oracle

-- Use SID and SERIAL# from the query above
-- IMMEDIATE bypasses the graceful wait
ALTER SYSTEM KILL SESSION '142,3891' IMMEDIATE;

-- Check: status = KILLED means rollback is in progress (normal)
SELECT sid, serial#, status, username
FROM   v$session
WHERE  sid = 142;

-- If status stays KILLED for over 5 minutes:
-- The OS process may need termination at the OS level
-- Find the OS PID: SELECT spid FROM v$process p JOIN v$session s ON p.addr = s.paddr WHERE s.sid = 142;
-- Then: kill -9 {spid} on Linux

Deadlock Trace Files

Oracle writes a trace file to the ADR directory when a deadlock occurs, containing the full lock graph. Find the most recent ORA-00060 trace file.

Locate recent deadlock incidents

Oracle 11g+

-- Find trace directory
SELECT value FROM v$parameter WHERE name = 'background_dump_dest';

-- List ORA-00060 (deadlock) incidents, most recent first
SELECT incident_id, create_time, error_number, error_msg
FROM   v$diag_incident
WHERE  error_number = 60
ORDER BY create_time DESC
FETCH FIRST 10 ROWS ONLY;

-- The trace file will be in background_dump_dest
-- Named: ora_{spid}_{instance}.trc
-- Search it for "Deadlock graph"

Prevention

Idle session cleanup via Profile

Oracle

-- Create a profile to kill idle sessions automatically
CREATE PROFILE app_profile LIMIT
  IDLE_TIME         30    -- minutes before idle session killed
  CONNECT_TIME      480   -- 8-hour max session lifetime
  SESSIONS_PER_USER 100; -- prevent connection flooding

ALTER USER app_user PROFILE app_profile;

-- Verify the profile is active
SELECT username, profile, account_status
FROM   dba_users
WHERE  username = 'APP_USER';

✓

✓ Technical Review Complete — April 2026

The Oracle section is accurate for 11g through 21c. One important addition: if ALTER SYSTEM KILL SESSION returns immediately but the status stays as KILLED for an extended period, it means Oracle is waiting for the session to respond before rolling back its work. This is expected for sessions with large open transactions. You can force immediate cleanup using ALTER SYSTEM DISCONNECT SESSION 'sid,serial' POST_TRANSACTION instead.

On IDLE_TIME in profiles: The value is in minutes, not seconds. A value of 30 means sessions idle for 30 minutes will be marked for termination by PMON. PMON runs on a sweep cycle, so the actual kill may happen a few minutes after the threshold is reached.

SQL Server Consulting

SQL SERVER PERFORMANCE · PRODUCTION · CONSULTING

Query Performance
Specialist

12 years · Banking & Financial Services · Critical Infrastructure

The problems that reach me are consistent: a query that performs correctly in test and collapses under production load, a plan that changed overnight with no code change, a blocking chain where every session looks like the blocker. These are not random failures — they have patterns, and those patterns are in the execution plan.

Every diagnosis starts at the execution plan. The plan does not lie — it shows exactly what SQL Server decided to do and why. Reading them across hundreds of production instances for 12 years means recognising the failure mode within minutes, not hours.

"The execution plan is the only honest account of what SQL Server actually did. Everything else is a theory."

Areas of Specialisation

→

Query plan regression & parameter sniffing

Plans that worked yesterday, failing today. The execution plan shows exactly why.

→

Blocking chains & deadlock diagnosis

Extended Events, sys.dm_exec_requests, lock escalation. Finding the root blocker, not just the victims.

→

Index strategy & audit

Identifying unused indexes that are costing write performance, and missing ones that are costing reads.

→

TempDB contention & sort spill analysis

Sort spills, hash spills, version store growth. The hidden performance killers that don't show up in CPU metrics.

→

Wait stats & DMV-based root cause analysis

PAGEIOLATCH, CXPACKET, LCK_M_X — reading wait statistics as a diagnostic language, not just metrics.

→

Plan cache management & SET option diagnosis

ARITHABORT mismatch, plan proliferation, the query that runs in 40ms from SSMS and 22 seconds from the app.

Who This Is For

▸ Engineering teams with a performance incident they cannot explain

▸ Companies running SQL Server without a dedicated DBA

▸ Organisations where queries worked fine until they didn't

▸ Teams preparing for a high-traffic event and needing a pre-flight review

Get in Touch

📞

+91 96331 31861

Call directly · India time (IST)

💬 WHATSAPP →

querytuning.org · SQL Server performance · Production systems
Response within 24 hours

Production is blocked. Here's how to find it, kill it, and stop it coming back.