QueryTuning.org
Database Query Performance Reference
SQL SERVERPOST-MORTEM

Ghost Cleanup Consumed All Disk IOPS After a 90M Row Delete

JK
James Kowalski
Senior DBA · QueryTuning
Jun 10, 2026
11 min read
SQL Server 2014+

Saturday, 11:40. Every query on the production database was running at 3–10x normal duration. Read latency: 200ms across every data file. The storage array was at saturation. But no large queries were running. Active session count was normal. CPU at 8%.

The operations team had completed a scheduled data purge at 11:00 — deleting 90 million rows from a transaction history table. The delete finished cleanly. Forty minutes later, the I/O storm began.

The Alert

Monitoring showed sustained 15,000 IOPS — the exact rated maximum of the storage array — but sys.dm_exec_requests showed only 40 active sessions, none doing anything heavy. The I/O was coming from somewhere inside the engine, not from user queries.

First Hypothesis: Checkpoint Flooding

After a mass delete, dirty pages need flushing. We assumed checkpoint was the source. Checked three times, thirty seconds apart.

Check for active checkpoint and ghost cleanup
SQL Server
SELECT r.session_id, r.command, r.status,
  r.wait_type, r.reads, r.writes
FROM sys.dm_exec_requests r
WHERE r.command LIKE '%CHECKPOINT%'
   OR r.command LIKE '%GHOST%'
   OR r.command LIKE '%LAZY%';

No checkpoint running. No lazy writer. But one entry: command = GHOST CLEANUP. Reads climbing. We ran it again — ghost cleanup still there, reads up by 400,000. Third check: same process, reads up another 400,000. The ghost cleanup task was consuming every available IOP.

The Discovery

When SQL Server deletes rows from an index, it marks them as ghost records rather than immediately reclaiming page space. A background task — ghost cleanup — revisits those pages later to physically remove the records. On 90 million deleted rows across a clustered index with 7 nonclustered indexes, the ghost count was staggering.

Measure ghost record volume
SQL Server
SELECT
  OBJECT_NAME(ips.object_id) AS table_name,
  i.name AS index_name,
  SUM(ips.ghost_record_count) AS ghost_records,
  SUM(ips.page_count) AS total_pages
FROM sys.dm_db_index_physical_stats(
  DB_ID(), OBJECT_ID('dbo.txn_history'), NULL, NULL, 'DETAILED'
) ips
JOIN sys.indexes i ON i.object_id = ips.object_id AND i.index_id = ips.index_id
GROUP BY ips.object_id, i.name ORDER BY ghost_records DESC;

To understand the scale: ghost cleanup is a single-threaded background process that walks each index's B-tree, finds pages containing ghost records, acquires a brief latch on each page, removes the ghost entries, and deallocates empty pages. On a well-indexed table with 7 nonclustered indexes, every row deletion creates one ghost per index — so 90 million deletes produced 90 million × 8 = 720 million ghost records initially. By the time we checked, cleanup had already processed 77 million, leaving 643 million.

The I/O pattern was particularly destructive. Ghost cleanup does not read pages sequentially — it walks the B-tree in logical order, which maps to random physical I/O on disk. Random reads on spinning disks are limited by seek time; on SSDs, they are limited by queue depth. Either way, ghost cleanup's pattern saturates the device's random IOPS capacity long before bandwidth is exhausted. Our storage showed 15,000 IOPS at only 180 MB/s — well below its sequential throughput of 2 GB/s.

We ran the count three times, five minutes apart. First: 643M. Second: 598M. Third: 551M. Ghost cleanup was working — removing 45 million every five minutes — but at the cost of saturating the storage array.

Incident Timeline

TimeEvent
10:00Scheduled purge begins — DELETE WHERE created_date < '2023-01-01'
11:02Delete completes. 90M rows removed.
11:05Ghost cleanup activates. Begins visiting pages.
11:40Read latency alert: 200ms across all data files
12:00Ghost cleanup disabled via TF 661. Latency drops to 4ms in 90 seconds.
12:30Ghost cleanup re-enabled. Completes over 6 hours at low priority.

Root Cause

Mass delete of 90M rows created 643M ghost records across 8 indexes. Ghost cleanup is single-threaded and unthrottled by default. It performed random I/O across all index B-trees simultaneously, consuming the storage array's full 15,000 random IOPS. Every user query was starved of I/O.

The Fix

Disable ghost cleanup, let it recover later
SQL Server
-- Disable ghost cleanup immediately
DBCC TRACEON(661, -1);

-- Verify latency dropping
SELECT DB_NAME(database_id) AS db,
  io_stall_read_ms / NULLIF(num_of_reads, 0) AS avg_read_ms
FROM sys.dm_io_virtual_file_stats(NULL, NULL);

-- Re-enable during off-peak
DBCC TRACEOFF(661, -1);

Prevention

Mass deletes now batched: 500,000 rows per transaction with a 10-second WAITFOR between batches. This gives ghost cleanup time to process each batch before the next arrives, spreading IOPS over hours instead of spiking in one burst. The batch approach also keeps the transaction log under control — a single 90-million-row delete generates a massive log record that can fill the transaction log; batches of 500,000 each generate manageable log entries that get truncated between batches.

Purges run in the 02:00–05:00 window. A monitoring query checks ghost record counts after every purge and alerts if the total exceeds 50 million. We also added a pre-purge checklist: verify the maintenance window is long enough, confirm no other I/O-heavy jobs are scheduled, and ensure the storage array has at least 40% IOPS headroom before starting the delete. with a 10-second WAITFOR between batches. This gives ghost cleanup time to process each batch, spreading IOPS over hours. Purges run in the 02:00–05:00 window. A monitoring query checks ghost record counts after every purge and alerts if the total exceeds 50 million.