ALTER TABLE Took the Table Offline for 4 Hours Until We

The migration script had run successfully in staging. It ran for 4 hours in production before a junior engineer noticed the entire orders table was locked and terminated it. 4 hours of queued reads and writes, every order placement failing with a lock wait timeout. The script used the wrong ALGORITHM — one that required a full table rebuild with an exclusive metadata lock instead of the online path that would have taken 40 seconds.

The Alert

09:15. Application error rate climbs immediately after a deployment step runs ALTER TABLE orders ADD COLUMN notes TEXT. Every insert and select on orders returns lock wait timeout. The ALTER was still running. It had been running for 3 minutes before the first alert fired.

First Hypothesis: Large Table, Slow Operation

The orders table had 200 million rows. An ALTER TABLE on 200M rows takes time. The assumption was that this was expected and would complete. At the 2-hour mark, with no end in sight and orders failing, the assumption changed.

The Discovery

Check what the ALTER TABLE is doing and what is waiting

MySQL

-- Find the ALTER TABLE process:
SELECT id, user, host, db, command, time, state, info
FROM   information_schema.processlist
WHERE  info LIKE 'ALTER%';

-- State: "copy to tmp table" = ALGORITHM=COPY, full table rebuild with lock
-- State: "Waiting for table metadata lock" = stuck waiting on the ALTER
-- Count waiting connections:
SELECT COUNT(*) AS waiting_count, state
FROM   information_schema.processlist
WHERE  state LIKE 'Waiting for table%'
GROUP BY state;

State: copy to tmp table. ALGORITHM=COPY. MySQL was copying every row from orders to a new tmp table with the new column, then renaming. For 200 million rows this took hours. During the copy, a shared metadata lock prevents concurrent writes. Every INSERT and UPDATE to orders was queuing behind the ALTER.

Incident Timeline

Time	Event
09:15	ALTER TABLE orders ADD COLUMN notes TEXT begins. ALGORITHM=COPY (default).
09:15	Shared metadata lock acquired. Writes begin queuing.
09:17	Application timeouts begin. Error rate climbs.
11:15	Decision to kill the ALTER TABLE. KILL [process_id].
11:20	ALTER killed. Rollback begins — another 15 minutes to clean tmp table.
11:35	Table available. Orders resume. Lock wait queue drains.
12:00	Same ALTER run with ALGORITHM=INSTANT. Completes in under 1 second.

Root Cause

MySQL's default ALTER TABLE algorithm on older versions is COPY — which rebuilds the entire table. Adding a nullable column with no default in MySQL 8.0 supports ALGORITHM=INSTANT, which completes in milliseconds by modifying only the table metadata. The script had not specified ALGORITHM, using the slowest path on a 200M row table.

The Fix

Always specify ALGORITHM and LOCK for schema changes

MySQL

-- Test which algorithms are available first:
ALTER TABLE orders
  ADD COLUMN notes TEXT
  ALGORITHM = INSTANT;
-- If ERROR 1846: INSTANT not supported, try INPLACE
-- If INPLACE fails, use gh-ost or pt-online-schema-change

-- ALGORITHM=INSTANT: metadata only, zero lock, zero data copy (MySQL 8.0)
-- ALGORITHM=INPLACE: modifies in-place, allows concurrent DML
-- ALGORITHM=COPY:    full table copy, blocks writes — avoid on large tables

-- Fail fast if the operation would lock:
ALTER TABLE orders
  ADD COLUMN notes TEXT
  ALGORITHM = INSTANT,
  LOCK = NONE;  -- fails immediately if a lock would be required

Prevention

All schema changes on tables over 1M rows require explicit ALGORITHM=INSTANT, LOCK=NONE or use gh-ost (GitHub's online schema change tool). Any ALTER that cannot run with LOCK=NONE goes through the gh-ost process which uses a ghost table, triggers, and final atomic swap with no application downtime. The migration script checklist now includes algorithm verification on staging with production row counts simulated via EXPLAIN.

Ananya Menon

MySQL DBA · 9 years production · QueryTuning

Ananya has spent 9 years managing MySQL and MariaDB for SaaS companies. She specialises in online schema changes, DDL safety, and zero-downtime migrations.

ALTER TABLE Took the Table Offline for 4 Hours Until We Used the Right Syntax