CPU at 100% on SQL Server has six common causes. This runbook walks them in order of likelihood — most incidents resolve at step 1 or 2. Run each query set in sequence. Do not jump to a later step until you have confirmed the earlier ones do not apply.
Check Task Manager or perfmon: confirm sqlservr.exe is responsible for the CPU spike, not a backup agent, antivirus scan, or another process on the same host. If another process is responsible, this is a host problem, not a SQL Server problem.
Establish Baseline: Is This Normal?
-- CPU utilisation history from SQL Server ring buffer (last 256 measurements) SELECT TOP 30 DATEADD(ms, -1 * (rc.cpu_ticks / CONVERT(FLOAT, rc.ms_ticks)) * (SELECT MAX(timestamp) - timestamp FROM sys.dm_os_ring_buffers WHERE ring_buffer_type = 'RING_BUFFER_SCHEDULER_MONITOR'), GETDATE()) AS sample_time, rr.record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS sql_cpu_pct, rr.record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS idle_pct FROM ( SELECT timestamp, CONVERT(XML, record) AS record FROM sys.dm_os_ring_buffers WHERE ring_buffer_type = 'RING_BUFFER_SCHEDULER_MONITOR' ) rr CROSS APPLY rr.record.nodes('//Record') AS rr(record) CROSS JOIN sys.dm_os_sys_info rc ORDER BY sample_time DESC; -- sql_cpu_pct = SQL Server CPU usage in that 1-second sample -- Sustained > 90% = genuine saturation
Step 1 — Find the Rogue Query (Resolves 60% of Incidents)
The most common cause: one query consuming the majority of CPU in a tight loop or processing an enormous result set without the right index. Run this first — it usually identifies the problem in under 30 seconds.
SELECT TOP 10 r.session_id, r.cpu_time AS cpu_ms, r.total_elapsed_time / 1000 AS elapsed_seconds, r.logical_reads, r.reads AS physical_reads, s.login_name, s.host_name, s.program_name, DB_NAME(r.database_id) AS database_name, SUBSTRING(st.text, (r.statement_start_offset/2)+1, ((CASE r.statement_end_offset WHEN -1 THEN DATALENGTH(st.text) ELSE r.statement_end_offset END - r.statement_start_offset)/2)+1) AS current_statement FROM sys.dm_exec_requests r JOIN sys.dm_exec_sessions s ON s.session_id = r.session_id CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) st WHERE r.session_id != @@SPID AND r.cpu_time > 0 ORDER BY r.cpu_time DESC; -- One session with cpu_time 10-100x higher than others = rogue query -- High logical_reads alongside high cpu_time = missing index (step 2)
Step 2 — Missing Index Causing CPU-Intensive Table Scans
SELECT TOP 10 mid.statement AS table_name, ROUND(migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans), 0) AS impact_score, migs.user_seeks + migs.user_scans AS total_uses, ROUND(migs.avg_total_user_cost, 1) AS avg_cost, mid.equality_columns, mid.inequality_columns, mid.included_columns, 'CREATE INDEX IX_Missing ON ' + mid.statement + ' (' + ISNULL(mid.equality_columns, '') + CASE WHEN mid.inequality_columns IS NOT NULL THEN CASE WHEN mid.equality_columns IS NOT NULL THEN ',' ELSE '' END + mid.inequality_columns ELSE '' END + ')' AS suggested_index FROM sys.dm_db_missing_index_groups mig JOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handle JOIN sys.dm_db_missing_index_details mid ON mid.index_handle = mig.index_handle WHERE mid.database_id = DB_ID() ORDER BY impact_score DESC;
Step 3 — CPU from Blocking Chain Overhead
Large blocking chains — hundreds of sessions waiting for locks — create CPU overhead from connection scheduling and retry logic even though individual blocked sessions are idle. If step 1 shows many sessions with low individual CPU but collectively high, check for blocking:
SELECT COUNT(*) AS blocked_sessions, MAX(wait_time/1000) AS max_wait_seconds, MIN(blocking_session_id) AS head_blocker FROM sys.dm_exec_requests WHERE blocking_session_id > 0; -- > 50 blocked sessions can create measurable CPU overhead from scheduling -- Resolve by killing head_blocker (after checking what it is doing)
Step 4 — Recompilation Storm
If a stored procedure or query is compiled thousands of times per minute, the compilation overhead itself drives CPU. This happens after statistics updates, schema changes, or when OPTION(RECOMPILE) is used on a high-frequency query.
-- Check recompilation rate from performance counters SELECT object_name, counter_name, cntr_value AS per_second FROM sys.dm_os_performance_counters WHERE counter_name IN ('SQL Re-Compilations/sec', 'SQL Compilations/sec'); -- Re-Compilations/sec > 100 while CPU is high = recompilation storm
Step 5 — Runaway SQL Agent Job
SELECT j.name AS job_name, ja.start_execution_date, DATEDIFF(minute, ja.start_execution_date, GETDATE()) AS running_minutes, ja.last_executed_step_id, r.cpu_time AS step_cpu_ms FROM msdb.dbo.sysjobactivity ja JOIN msdb.dbo.sysjobs j ON j.job_id = ja.job_id LEFT JOIN sys.dm_exec_requests r ON r.session_id IN ( SELECT session_id FROM sys.dm_exec_sessions WHERE program_name LIKE '%JobRunner%') WHERE ja.start_execution_date IS NOT NULL AND ja.stop_execution_date IS NULL ORDER BY running_minutes DESC; -- Any job running 3x longer than its normal duration warrants investigation
Step 6 — Connection Flood
SELECT program_name, login_name, status, COUNT(*) AS sessions FROM sys.dm_exec_sessions WHERE is_user_process = 1 GROUP BY program_name, login_name, status ORDER BY sessions DESC; -- Total vs normal baseline SELECT COUNT(*) AS total_user_sessions FROM sys.dm_exec_sessions WHERE is_user_process = 1; -- 2-5x above your normal peak = connection pool misconfiguration or leak
After the Incident: Set Up Baseline Monitoring
CPU incidents are much easier to diagnose with before-and-after comparisons. After resolving this incident, capture a baseline snapshot of the DMVs above during normal operation. The next time CPU spikes, comparing against baseline immediately narrows the cause from six options to one.
On Query Store (SQL Server 2016+): If you are on SQL Server 2016 or later, enable Query Store on production databases. Query Store captures plan and performance history continuously. When CPU spikes, Query Store's "Top Resource Consuming Queries" report shows you exactly which queries changed behaviour and when — without needing to have been actively investigating during the incident.