The application had scaled to 12 instances, each with its own connection pool configured to hold 200 connections. That was 2,400 connections to PostgreSQL. Most were idle. PostgreSQL was spending more CPU time managing idle connection state than executing queries. A single CPU core was permanently occupied doing nothing useful.
The Alert
CPU at 15% on a lightly loaded server. Database response times were slow despite the low CPU and no active workload. The symptom: CPU was busy but queries were not running. Something was consuming CPU that was not query execution.
The Discovery
SELECT state, count(*) AS count, max(now() - state_change) AS max_idle_duration, client_addr FROM pg_stat_activity WHERE pid <> pg_backend_pid() GROUP BY state, client_addr ORDER BY count DESC;
2,047 connections in state idle. Some had been idle for over 4 hours. Each idle PostgreSQL connection is a forked backend process maintaining session state, checking for client disconnects, and processing keepalive packets. 2,000 backends doing nothing consumed a full CPU core in aggregate overhead.
Incident Timeline
| Time | Event |
|---|---|
| 3 months ago | Application scaled from 4 to 12 instances. Pool size not recalculated. |
| Ongoing | 2,400 connections held open. Most idle at any given time. |
| Investigation day | 15% CPU on idle server investigated |
| +1 hour | 2,047 idle connections identified as the source |
| +2 hours | PgBouncer deployed in transaction pooling mode |
| +3 hours | Application connections reduced to 25 real PostgreSQL connections. CPU: 2% |
Root Cause
PostgreSQL uses one backend process per connection. Each backend consumes memory and CPU regardless of whether it is executing queries. At 2,000 idle connections, the aggregate overhead of connection management exceeded the cost of the actual query workload.
The Fix
# pgbouncer.ini — transaction pooling for OLTP workloads
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb
[pgbouncer]
pool_mode = transaction
max_client_conn = 2000 # accept 2000 app connections
default_pool_size = 25 # maintain only 25 real PostgreSQL connections
server_idle_timeout = 600 # close idle server connections after 10 min
client_idle_timeout = 0 # keep app connections open (app manages this)Prevention
The rule: max PostgreSQL connections = (CPU cores × 2) + disk spindles, or roughly 50-100 for most servers. Application connection pool size is calculated as total_postgres_connections ÷ application_instances. PgBouncer is now the standard entry point for all application connections.
Before You Kill Connections: Confirm the Scope
Killing connections without understanding the source is temporary relief. The pool fills again within minutes if the root cause — idle connections held by application pool configuration — is not fixed. Before taking action, measure the full scope.
SELECT state, client_addr, application_name, count(*) AS connection_count, max(extract(epoch from (now() - state_change)))::int AS max_idle_sec, avg(extract(epoch from (now() - state_change)))::int AS avg_idle_sec FROM pg_stat_activity WHERE pid <> pg_backend_pid() GROUP BY state, client_addr, application_name ORDER BY connection_count DESC;
SELECT current_setting('max_connections') AS max_connections, count(*) AS current_connections, current_setting('max_connections')::int - count(*) AS slots_remaining, current_setting('superuser_reserved_connections') AS superuser_reserved FROM pg_stat_activity; -- superuser_reserved connections (default: 3) are always available to superusers -- even if regular connections are exhausted. -- Connect as superuser if regular login is failing due to exhaustion.
-- Emergency: terminate idle connections held more than 10 minutes -- This is temporary relief — the pool refills without a config change SELECT pg_terminate_backend(pid), client_addr, application_name, state, extract(epoch from (now() - state_change))::int AS idle_sec FROM pg_stat_activity WHERE state = 'idle' AND state_change < now() - INTERVAL '10 minutes' AND pid <> pg_backend_pid();