The server had 64GB of RAM. MySQL was allocated 4GB for the InnoDB buffer pool. The database was 38GB. Every query that touched a page not in the 4GB buffer pool went to disk. 90% of queries were going to disk. The fix took 30 seconds. The investigation took three hours because nobody had looked at the buffer pool hit ratio.
The Alert
Escalating query latency over three weeks. No schema changes, no data growth anomalies, no unusual queries. The server was showing high disk I/O consistently. CPU was low. The pattern: many queries waiting on disk reads, not on CPU or locks.
The Discovery
SELECT variable_name, variable_value FROM performance_schema.global_status WHERE variable_name IN ( 'Innodb_buffer_pool_read_requests', -- logical reads 'Innodb_buffer_pool_reads', -- physical reads (cache miss) 'Innodb_buffer_pool_size' ); -- Hit ratio = (read_requests - reads) / read_requests * 100 -- Target: > 99% for OLTP workloads SELECT ROUND((1 - Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests) * 100, 4) AS buffer_pool_hit_pct FROM ( SELECT SUM(CASE WHEN variable_name='Innodb_buffer_pool_reads' THEN variable_value END) AS Innodb_buffer_pool_reads, SUM(CASE WHEN variable_name='Innodb_buffer_pool_read_requests' THEN variable_value END) AS Innodb_buffer_pool_read_requests FROM performance_schema.global_status WHERE variable_name IN ('Innodb_buffer_pool_reads','Innodb_buffer_pool_read_requests') ) t;
Buffer pool hit ratio: 72%. Industry target is 99%+. At 72%, nearly 3 in 10 page reads were going to disk. With a 38GB database and 4GB buffer pool, only the most frequently accessed 10% of data was staying warm. The server had 60GB of unused RAM.
Incident Timeline
| Time | Event |
|---|---|
| Database creation | innodb_buffer_pool_size left at default (128MB). Manually set to 4GB later. |
| 3 weeks ago | Database grows to 38GB. Buffer pool ratio begins declining. |
| Investigation day | Buffer pool hit ratio: 72%. 4GB pool confirmed. |
| +30 minutes | innodb_buffer_pool_size = 48G set dynamically (MySQL 8.0 supports online resize) |
| +45 minutes | Buffer pool warms up. Hit ratio climbs to 98%. |
| +60 minutes | Query latency returns to baseline. Disk I/O drops 85%. |
Root Cause
The buffer pool had not been sized when the database was small. The default setting persisted as the database grew to 38GB. The server had 64GB of RAM sitting idle while MySQL performed disk I/O for 28% of page reads.
The Fix
-- MySQL 8.0: online resize without restart SET GLOBAL innodb_buffer_pool_size = 48 * 1024 * 1024 * 1024; -- 48GB -- Sizing rule: -- If database fits in RAM: set to database_size × 1.1 -- If database is larger than RAM: set to 70-80% of total RAM -- This server: 64GB RAM × 0.75 = 48GB -- Persist in my.cnf: -- innodb_buffer_pool_size = 48G -- Monitor warming progress: SELECT pool_id, pages_data, pages_free, pages_total FROM information_schema.innodb_buffer_pool_stats;
Prevention
Buffer pool hit ratio is now on the primary monitoring dashboard. Any reading below 98% triggers an alert for investigation. The buffer pool size is reviewed whenever the database grows by more than 20% — the review is automated via a weekly check of database size vs buffer pool allocation.