The change looked harmless: increase work_mem from 64MB to 1GB to speed up analytics queries. The change went into postgresql.conf on Friday afternoon. By Friday evening, reporting queries were being killed by the Linux OOM killer every time they ran. The application team reported that the reporting module had stopped working.
The Alert
PostgreSQL log at 18:44: Out of memory: Kill process 23847 (postgres). Then again at 18:46. And 18:49. Every reporting query was being killed partway through execution. The server had 64GB of RAM. PostgreSQL was configured with shared_buffers at 16GB. 47GB should have been available for queries. The math seemed fine.
First Hypothesis: Memory Leak
A memory leak in a reporting query or a PL/pgSQL function that was not releasing allocations. We checked pg_stat_activity for long-running sessions with high memory usage.
SELECT pid, usename, application_name, state, pg_size_pretty(memory_usage) AS mem, query FROM ( SELECT pid, usename, application_name, state, -- memory_usage available in pg 16+, use pg_backend_memory_contexts for earlier pg_size_pretty(sum(allocated_size)) AS memory_usage, query FROM pg_stat_activity psa JOIN pg_backend_memory_contexts() ON true GROUP BY pid, usename, application_name, state, query ) x;
The Discovery
The reporting query was not leaking. It was using exactly what PostgreSQL had granted it. The problem was the grant itself. A complex reporting query can use work_mem multiple times — once per sort node, once per hash operation, once per hash join. The query had 12 sort nodes. work_mem × 12 nodes × 4 concurrent sessions = 1GB × 12 × 4 = 48GB. The server had 47GB available.
EXPLAIN (ANALYZE, BUFFERS) [the reporting query]; -- Count sort and hash operations in the plan: -- Every "Sort" node can use up to work_mem -- Every "Hash" node can use up to work_mem -- Every "Hash Join" node can use up to work_mem -- Formula: risk = work_mem × node_count × max_concurrent_sessions -- Safe work_mem calculation: -- available_memory = total_ram - shared_buffers - os_overhead -- max_concurrent = max_connections × average_active_fraction -- max_nodes = worst_case_sort_nodes_per_query -- safe_work_mem = available_memory / (max_concurrent × max_nodes) -- Our numbers: 47GB / (4 × 12) = 980MB — right at the edge -- Actual granted: 1024MB × 12 × 4 = 49.2GB. OOM.
Incident Timeline
| Time | Event |
|---|---|
| 17:30 | postgresql.conf: work_mem changed from 64MB to 1GB. pg_reload_conf() called. |
| 18:44 | First OOM kill. Reporting query killed mid-execution. |
| 18:46 | Second OOM kill. Retry of same query, same result. |
| 19:15 | On-call engineer connects. Identifies work_mem change at 17:30. |
| 19:18 | ALTER ROLE reporting_user SET work_mem = '256MB' |
| 19:19 | Reporting queries complete successfully. p99: 4.2 seconds. |
| 19:20 | Global work_mem reverted to 64MB in postgresql.conf |
Root Cause
work_mem is allocated per sort/hash operation, not per query or per session. A query with 12 sort nodes can use up to 12 × work_mem simultaneously. Four concurrent reporting sessions × 12 nodes × 1GB = 48GB, exceeding available RAM. The Linux OOM killer killed the PostgreSQL backends to reclaim memory.
The Fix
-- Never set high work_mem globally. -- Set it per role for the users who need it: ALTER ROLE reporting_user SET work_mem = '256MB'; -- Or per database: ALTER DATABASE analytics SET work_mem = '256MB'; -- Global setting stays conservative: -- work_mem = 64MB # in postgresql.conf -- Calculate safe value first: -- SELECT (47 * 1024) / (4 * 12) AS safe_work_mem_mb; -- 980MB/12 nodes worst case
Prevention
Any postgresql.conf change that affects memory allocation now requires a calculation: total available RAM ÷ (max concurrent sessions × max sort nodes per query). The result must be above zero with a 20% safety margin. Changes go through a 24-hour monitoring window on a staging server before production.