QueryTuning.org
Database Query Performance Reference
SQL SERVERPOST-MORTEM

An Orphaned DTC Transaction Held a Lock for 11 Days

JK
James Kowalski
Senior DBA · QueryTuning
Jun 10, 2026
11 min read
SQL Server 2014+

For three days, the afternoon batch run slowed to a crawl. Every day at 14:00 when the reconciliation process kicked in, 200+ sessions piled up behind a blocker. By 14:30 the blocking cleared on its own as sessions timed out and retried. Nobody could find the root cause because by the time anyone investigated, the chain had dissolved.

The Alert

On day four, we set up a monitoring loop to capture the blocking chain the moment it formed.

Capture blocking chain in real-time
SQL Server
SELECT
  r.session_id AS blocked_spid,
  r.blocking_session_id AS blocker_spid,
  r.wait_type, r.wait_time / 1000 AS wait_sec,
  s.login_name, s.host_name, s.program_name, s.status,
  DATEDIFF(DAY, s.login_time, GETDATE()) AS session_age_days
FROM sys.dm_exec_requests r
JOIN sys.dm_exec_sessions s ON s.session_id = r.blocking_session_id
WHERE r.blocking_session_id > 0
ORDER BY r.wait_time DESC;

Caught it. Every blocked session pointed to SPID 87 as the root blocker. SPID 87's status: sleeping. Session age: 11 days. Login: a service account for the linked-server integration. We ran the query three more times over 60 seconds: SPID 87 was always the root, always sleeping, always 11 days old. This session had been sitting there since the previous Tuesday.

First Hypothesis: Idle Session with Open Transaction

A sleeping session blocking others usually means an uncommitted transaction. We checked.

Check for open transactions on the sleeping SPID
SQL Server
DBCC OPENTRAN;

SELECT
  s.session_id, s.status, s.login_time,
  t.transaction_id, t.name AS txn_name,
  t.transaction_begin_time,
  DATEDIFF(DAY, t.transaction_begin_time, GETDATE()) AS txn_age_days,
  dt.database_transaction_log_bytes_reserved
FROM sys.dm_tran_session_transactions st
JOIN sys.dm_exec_sessions s ON s.session_id = st.session_id
JOIN sys.dm_tran_active_transactions t ON t.transaction_id = st.transaction_id
LEFT JOIN sys.dm_tran_database_transactions dt ON dt.transaction_id = t.transaction_id
WHERE s.session_id = 87;

Transaction name: user_transaction. Transaction begin time: 11 days ago. Log bytes reserved: 84KB. The transaction had started 11 days ago, done almost nothing, and never committed. We verified three times — same transaction, same timestamp, same 11-day age. But then we saw the transaction type: it was enlisted in a distributed transaction. This was a DTC (Distributed Transaction Coordinator) transaction.

The Discovery

The linked-server integration ran a distributed query every night that pulled data from a remote Oracle instance. Eleven days ago, the Oracle side had thrown an ORA-12571 (TNS:packet writer failure) mid-transaction. The remote side rolled back. But the local SQL Server side never received the abort notification — the DTC transaction remained open, holding an exclusive IX lock on the target table. The lock escalated to a table-level IX lock because the original query had touched more than 5,000 rows.

We confirmed by checking sys.dm_tran_active_transactions — the transaction type was 2 (distributed). We also checked the DTC admin console, which showed the transaction in "In Doubt" state. Verified all three findings three times: distributed type, In Doubt status, 11-day age.

Incident Timeline

TimeEvent
Day 1, 02:00Linked server query starts distributed transaction
Day 1, 02:03Oracle side throws ORA-12571. Remote rollback.
Day 1, 02:03SQL Server DTC transaction enters "In Doubt" state. Lock held.
Day 2–4Afternoon blocking appears and self-resolves via timeouts
Day 4, 14:05Monitoring captures SPID 87 as root blocker. 11 days old.
Day 4, 14:10KILL 87. DTC transaction resolved. Blocking clears.

Root Cause

A network failure between SQL Server and a linked Oracle instance left a distributed transaction in "In Doubt" state. The DTC protocol requires both sides to agree on commit or rollback — when the Oracle side became unreachable, the local transaction stayed open indefinitely. SQL Server's default behavior is to hold locks on In Doubt transactions until manual resolution, because committing or rolling back unilaterally could cause data inconsistency.

The Fix

Kill the orphan and configure DTC timeout
SQL Server
-- Kill the orphaned session
KILL 87;

-- Configure DTC to auto-resolve after timeout
-- In Component Services > DTC > Properties > Transaction Manager:
-- Set "Transaction Timeout" to 300 seconds (5 minutes)

-- Monitor for In Doubt transactions
SELECT transaction_id, name, transaction_begin_time,
  DATEDIFF(HOUR, transaction_begin_time, GETDATE()) AS age_hours
FROM sys.dm_tran_active_transactions
WHERE transaction_type = 2
  AND DATEDIFF(HOUR, transaction_begin_time, GETDATE()) > 1;

Prevention

DTC transaction timeout is now set to 300 seconds. A nightly monitoring job queries sys.dm_tran_active_transactions for distributed transactions older than 1 hour and alerts immediately. The linked-server integration now uses explicit TRY/CATCH with a rollback on failure, and the linked server timeout is set to 60 seconds rather than the default of 0 (infinite).

We also removed the distributed transaction entirely. The integration now uses a message queue pattern: the SQL Server side writes to a staging table, a service reads from the staging table and writes to Oracle separately, and a reconciliation check verifies both sides match. This eliminates the two-phase commit dependency and makes the integration tolerant of Oracle outages. If Oracle is unreachable, the staging table queues rows until connectivity returns — no orphaned transactions, no locks held across an unreliable network link.

The 11-day lock went undetected because the blocking only manifested during the afternoon batch — the rest of the day, queries that needed the locked table happened to use different partitions or took optimistic read paths. Intermittent, time-of-day-dependent blocking is the hardest pattern to diagnose because it self-resolves before anyone looks. The monitoring loop we deployed — capturing blocking chains every 30 seconds into a persistent table — would have caught this on day one instead of day four. A nightly monitoring job queries sys.dm_tran_active_transactions for distributed transactions older than 1 hour and alerts immediately. The linked-server integration now uses explicit TRY/CATCH with a rollback on failure, and the linked server timeout is set to 60 seconds rather than the default of 0 (infinite).