QueryTuning.org
Database Query Performance Reference
PostgreSQLPerformancedetailed analysis

The Autovacuum That Quietly Ate Our Entire IOPS Budget for Six Weeks

SR
Sanjay Rao
Staff DBA · PostgreSQL · 11 years production
April 2, 2026
14 min read
PostgreSQL 10+

Six weeks of phantom slowdowns. Every night between 02:00 and 06:00, query latency across all tables would climb 3-5x. CPU was flat. Connections were normal. Disk I/O was pegged at 100% — but no query in pg_stat_activity was doing anything obviously expensive. The cause was autovacuum, running with no I/O budget, consuming every disk operation the server had.

The Alert

Overnight monitoring showed disk read/write latency spiking every night in the same window. The pattern was too regular to be random queries. Something scheduled was running. The autovacuum process does not appear in pg_stat_activity with a query — it shows as a maintenance process — and for six weeks it had not been on the investigation list.

First Hypothesis: Overnight Batch Jobs

Cron jobs, maintenance scripts, or analytics queries running overnight were the first assumption. We pulled pg_stat_statements for the high-I/O window.

Find high I/O queries from pg_stat_statements
PostgreSQL
SELECT
  substring(query, 1, 80) AS query_preview,
  calls,
  round(total_exec_time::numeric, 2) AS total_ms,
  shared_blks_read,
  shared_blks_written,
  local_blks_read + temp_blks_read AS temp_reads
FROM   pg_stat_statements
ORDER BY shared_blks_read + shared_blks_written DESC
LIMIT 20;

No smoking gun. The I/O was not from user queries. It had to be a background process. We looked at pg_stat_progress_vacuum.

The Discovery

Monitor active autovacuum workers
PostgreSQL
SELECT
  schemaname,
  relname,
  phase,
  heap_blks_total,
  heap_blks_scanned,
  round(100.0 * heap_blks_scanned / nullif(heap_blks_total, 0), 2) AS pct_done,
  index_vacuum_count,
  num_dead_tuples
FROM   pg_stat_progress_vacuum;

-- Also check the cost settings:
SHOW autovacuum_vacuum_cost_delay;   -- default: 2ms
SHOW autovacuum_vacuum_cost_limit;   -- default: 200

autovacuum_vacuum_cost_delay was 2ms. autovacuum_vacuum_cost_limit was 200. These are the default settings. They allow autovacuum to consume approximately 100MB/second of I/O with minimal throttling. The events table — 800GB, 120 million rows, with 30% dead tuple ratio — was being vacuumed every night for 4 hours with no I/O cap, saturating the disk while every other query waited for I/O.

Incident Timeline

TimeEvent
6 weeks agoEvents table grows past autovacuum trigger threshold. Nightly vacuums begin.
Every night 02:00Autovacuum starts on events table. I/O saturates immediately.
Every night 02:00-06:00All queries competing for disk I/O. 3-5x latency increase.
Week 6Investigation begins. pg_stat_progress_vacuum checked for first time.
Investigation dayautovacuum_vacuum_cost_delay=2ms identified as the unrestricted setting
+30 minutesPer-table cost delay set to 20ms on events table
Next nightVacuum runs 10x slower. I/O remains within budget. No latency spike.

Root Cause

Default autovacuum cost settings allow vacuum to run with minimal I/O throttling. On small tables with low dead tuple rates this is fine. On an 800GB table with 30% dead tuples, a 4-hour vacuum at 100MB/second saturates any shared disk subsystem. The default settings are correct for tables up to a few GB. They require tuning on large tables.

The Fix

Set per-table autovacuum cost limits for large tables
PostgreSQL
-- Apply a strict I/O budget to the large table
-- cost_delay=20ms + cost_limit=200 = roughly 10MB/s max I/O from vacuum
ALTER TABLE events SET (
  autovacuum_vacuum_cost_delay = 20,
  autovacuum_vacuum_cost_limit = 200
);

-- For the global setting, a more conservative default:
-- autovacuum_vacuum_cost_delay = 10ms   # in postgresql.conf
-- autovacuum_vacuum_cost_limit = 200

-- Monitor vacuum progress and I/O impact:
SELECT relname, phase, heap_blks_scanned, heap_blks_total,
       round(100.0*heap_blks_scanned/nullif(heap_blks_total,0),1) AS pct
FROM   pg_stat_progress_vacuum;

Prevention

Any table over 50GB now has explicit autovacuum cost settings as part of its DDL definition. A weekly report checks all tables over 10GB that are using default cost settings and flags them for review.

SR
Sanjay Rao
Staff DBA · PostgreSQL · 11 years production · QueryTuning
Written by a working DBA from direct production experience. All queries in this article have been run on live database instances.