Too Many Connections: Raising max_connections Made It W

Monday morning. The application started throwing "Too many connections" errors at 09:12. The on-call engineer raised max_connections from 500 to 1,000. The errors continued. At 09:45 they raised it to 2,000. By 10:00 the server was OOM-killed. More connections had made the problem worse.

The Alert

Application error: ERROR 1040 (HY000): Too many connections. At peak, 500 connections were exhausted within 2 minutes of application startup. The application connection pool had been configured without a wait timeout — connections that could not establish would retry indefinitely, creating a flood of connection attempts against the already-exhausted pool.

First Hypothesis: Not Enough Connections

The immediate response — raise max_connections — was wrong. We found out why when the server ran out of memory. Each MySQL connection reserves memory: thread stack, sort buffer, join buffer, read buffer. At 2,000 connections, the theoretical maximum memory was 2,000 × ~8MB = 16GB on a 16GB server. The OOM killer ran before any queries did.

The Discovery

Find what all 500 connections were doing

MySQL

SELECT
  user,
  host,
  db,
  command,
  time                                AS seconds,
  state,
  LEFT(info, 80)                      AS query_preview
FROM   information_schema.processlist
ORDER BY command, time DESC
LIMIT  50;

Of the 500 connections: 487 were in state Sleep — idle connections held by the application's connection pool across 24 application instances, each with pool_size=25. Of the 13 active connections, 11 were waiting on the same lock. One connection had held an exclusive lock on the products table for 340 seconds — an admin script running a bulk UPDATE had been left running in a transaction from an overnight job that had been forgotten.

Incident Timeline

Time	Event
Sat 23:00	Overnight admin script starts bulk UPDATE on products table in a transaction
Sat 23:15	Admin closes laptop. Transaction left open. Lock held.
Mon 09:00	Monday morning traffic begins. Application instances start. Pool fills.
09:12	New queries queue behind the lock. Connection pool exhausted waiting.
09:12	"Too many connections" errors begin.
09:14	max_connections raised to 1,000. More connections accepted, all queue behind lock.
09:45	max_connections raised to 2,000. OOM kill at 10:00.
10:15	Server restarts. Admin script transaction rolled back automatically.
10:16	Lock released. Connections serve normally.

Root Cause

A forgotten open transaction held an exclusive lock on a high-contention table for 34 hours. Application connections queued behind the lock, exhausting the connection pool. Raising max_connections allowed more connections to queue — consuming more memory — without resolving the underlying lock. The correct response was to identify and kill the blocking transaction, not increase the connection limit.

The Fix

Find and kill the blocking transaction

MySQL

-- Find long-running transactions
SELECT
  trx_id,
  trx_state,
  trx_started,
  TIMESTAMPDIFF(SECOND, trx_started, NOW()) AS age_sec,
  trx_mysql_thread_id                                  AS connection_id,
  trx_rows_locked,
  trx_rows_modified
FROM   information_schema.innodb_trx
ORDER BY trx_started
LIMIT  10;

-- Kill the offender:
KILL [connection_id];

-- Set wait_timeout to auto-kill idle transactions:
-- SET GLOBAL wait_timeout = 3600;  -- 1 hour max idle

Prevention

wait_timeout is now set to 3600 seconds globally — no connection can idle for more than an hour. Admin scripts that use transactions run in a wrapper that sets a session-level lock timeout. The connection pool dashboard was added to the main monitoring board — connection count by state, not just total count.

Ananya Menon

MySQL DBA · 9 years production · QueryTuning

Ananya has spent 9 years managing MySQL and MariaDB for SaaS companies. She specialises in connection architecture, ProxySQL, and capacity planning.

Too Many Connections: Why Raising max_connections Made Our Outage Worse