The request came from the application team: migrate a 2TB OLTP database from READ COMMITTED to READ COMMITTED SNAPSHOT ISOLATION to eliminate reader-writer blocking. No maintenance window available — the database served customers 24 hours. The migration had to happen live. What followed was four hours of careful, reversible steps and one moment where the version store grew faster than anticipated.
The Alert
Not a traditional on-call incident — this was a planned migration that required real-time monitoring because it can produce its own alerts. Enabling RCSI causes SQL Server to begin version-stamping every modified row, writing before-images to the version store in tempdb. On a high-write database, the version store can grow unexpectedly fast during the transition period before all existing readers close their transactions.
First Hypothesis: Safe to Enable During Business Hours
Initial assumption: RCSI could be enabled with a simple ALTER DATABASE and would take effect immediately with no disruption. Testing on a 50GB copy had shown instant enable with minimal impact. A 2TB live database proved different.
The Discovery: Version Store Growth
The enable command was issued at 14:00. By 14:22 the version store had grown to 38GB and was expanding at 2GB per minute. Long-running report queries — some with 45-minute execution times — were pinning the version store because SQL Server must retain all versions of every row touched since those transactions started.
SELECT GETDATE() AS sample_time, SUM(version_store_reserved_page_count) * 8 / 1024 AS version_store_mb, SUM(version_store_reserved_page_count) * 8.0 / SUM(total_page_count) * 100 AS pct_of_tempdb, oldest_active_transaction.transaction_begin_time AS oldest_trx FROM sys.dm_db_file_space_usage CROSS JOIN ( SELECT TOP 1 transaction_begin_time FROM sys.dm_tran_active_transactions ORDER BY transaction_begin_time ) oldest_active_transaction;
The oldest active transaction had started at 13:47 — 13 minutes before the RCSI enable. It was a reporting query expected to run for 40 more minutes. SQL Server would retain versions of every changed row for the entire duration of that transaction.
Incident Timeline
| Time | Event |
|---|---|
| 13:47 | Long-running report starts. Expected completion: 55 minutes. |
| 14:00 | RCSI enable issued: ALTER DATABASE SET READ_COMMITTED_SNAPSHOT ON |
| 14:00 | SQL Server begins version-stamping all modified rows |
| 14:22 | Version store at 38GB, growing at 2GB/min. Alert fires. |
| 14:35 | Decision: terminate the long-running report or wait |
| 14:38 | Report owner agrees to terminate. KILL issued. |
| 14:39 | Version store cleanup begins. Growth stops. |
| 14:52 | Version store reduces to 4GB. Stable. |
| 18:00 | All sessions confirmed using RCSI. Reader-writer blocking eliminated. |
Root Cause
RCSI version store growth is bounded by the oldest active transaction. Any transaction that starts before or during the enable phase pins all row versions generated since its start time. Long-running reporting queries in the same database as the OLTP workload create a version store accumulation risk that does not appear on test databases with short transactions.
The Fix
-- Step 1: Confirm no long-running transactions before enabling SELECT session_id, transaction_begin_time, DATEDIFF(minute, transaction_begin_time, GETDATE()) AS age_minutes FROM sys.dm_tran_active_transactions WHERE DATEDIFF(minute, transaction_begin_time, GETDATE()) > 5; -- Step 2: Enable RCSI (this is online — no lock taken on the database) ALTER DATABASE [YourDatabase] SET READ_COMMITTED_SNAPSHOT ON WITH ROLLBACK IMMEDIATE; -- Step 3: Verify it is enabled SELECT name, is_read_committed_snapshot_on FROM sys.databases WHERE name = 'YourDatabase';
Prevention
RCSI migrations on high-write databases now follow a pre-flight checklist: kill or defer all transactions over 5 minutes old before enabling, allocate 100GB free TempDB space as buffer, monitor version store every 30 seconds for the first hour after enable, and have a rollback procedure ready. RCSI can be disabled instantly if version store growth becomes unmanageable.