One real incident. One deep post-mortem. One set of queries you can run right now. No padding. No sponsored content. No "have you checked the docs?"
No spam. Unsubscribe in one click. Always free.
24,000+ DBAs subscribed
Issue #141 published this week
All four databases covered on rotation
Recent Issues
What lands in your inbox every week
Real incidents, real queries. Four recent Tuesdays.
ISSUE #141PostgreSQL
The Autovacuum That Ate Our IOPS Budget — Fixed in 8 Minutes
A high-write OLTP table had dead tuple bloat exceeding 40%. Autovacuum was running, but cost_delay was throttling it to uselessness. The pg_stat_user_tables query that exposed it and the three-line config fix.
A stored procedure ran in 40ms on dev and 22 seconds on prod. The culprit wasn't the query or the index — it was a cached plan compiled for a rare parameter value. The full diagnosis and when to reach for OPTION(RECOMPILE).
⏱ 11 min read· Mar 25, 2026
ISSUE #139MySQL
InnoDB Gap Locks: The Invisible Deadlock Cause Nobody Checks First
Two transactions, two different rows — still deadlocked. This is the gap lock scenario that bites teams using REPEATABLE READ with range queries. Full lock graph and what we actually did.
⏱ 10 min read· Mar 18, 2026
ISSUE #138Oracle
Undo Contention Diagnosed in V$WAITSTAT — ORA-01555 Finally Explained
ORA-01555 Snapshot Too Old was firing on long-running reports during peak hours. Reading V$UNDOSTAT, sizing UNDO_RETENTION correctly, and the index-organized table trick that cut undo generation by 60%.
⏱ 8 min read· Mar 11, 2026
What's Inside Every Issue
Same six sections, every week.
Consistent format so you know what you're getting — and can skip straight to what you need.
🔥
Incident of the Week
One real post-mortem. What broke, what the timeline looked like, what the first wrong diagnosis was, and what actually fixed it.
🔍
Query of the Week
One diagnostic or maintenance query — fully annotated. Copy it straight into your runbook. All four databases on rotation.
📖
detailed analysis
A longer explainer on one internal concept. MVCC, WAL internals, cost-based optimizer mechanics — the stuff that makes you dangerous in an incident.
⚙️
Config Corner
One configuration parameter, explained properly. What it does, what happens when it's wrong, and the sensible default vs production-tuned value.
🗞️
Community Picks
The three best things published in the Postgres, MySQL, and SQL Server communities that week. Curated — not scraped.
✅
Config Corner
One configuration parameter explained properly. What it controls, what breaks when it's wrong, and the gap between the default value and what you actually want in production.
What Subscribers Say
This is the one newsletter I actually read end-to-end. The incident breakdowns are exactly the kind of thing that makes you a better DBA — not because it happened to you, but because now you know what to look for when it does.
Principal DBA, fintech companyPostgreSQL · 11 years
The Query of the Week alone is worth the subscription. I've added at least 20 of them to our internal runbooks. My team calls them "TQ queries" without even realising where they came from.
Database Platform Lead, e-commerceMySQL / Aurora · 8 years
I forwarded Issue #127 to three colleagues. Our on-call runbook now quotes it verbatim. The depth of the incident breakdowns is what convinced me to copy these queries straight into our runbooks.
Senior DBA, healthcare SaaSSQL Server · 14 years
Join 24,000 DBAs who read it every Tuesday.
Free. Written by DBAs, for DBAs. Written for engineers on-call at 2 AM.
SQL Server Consulting
SQL SERVER PERFORMANCE · PRODUCTION · CONSULTING
Query Performance Specialist
12 years · Banking & Financial Services · Critical Infrastructure
The problems that reach me are consistent: a query that performs correctly in test and collapses under production load, a plan that changed overnight with no code change, a blocking chain where every session looks like the blocker. These are not random failures — they have patterns, and those patterns are in the execution plan.
Every diagnosis starts at the execution plan. The plan does not lie — it shows exactly what SQL Server decided to do and why. Reading them across hundreds of production instances for 12 years means recognising the failure mode within minutes, not hours.
"The execution plan is the only honest account of what SQL Server actually did. Everything else is a theory."
Areas of Specialisation
→
Query plan regression & parameter sniffing
Plans that worked yesterday, failing today. The execution plan shows exactly why.
→
Blocking chains & deadlock diagnosis
Extended Events, sys.dm_exec_requests, lock escalation. Finding the root blocker, not just the victims.
→
Index strategy & audit
Identifying unused indexes that are costing write performance, and missing ones that are costing reads.
→
TempDB contention & sort spill analysis
Sort spills, hash spills, version store growth. The hidden performance killers that don't show up in CPU metrics.
→
Wait stats & DMV-based root cause analysis
PAGEIOLATCH, CXPACKET, LCK_M_X — reading wait statistics as a diagnostic language, not just metrics.
→
Plan cache management & SET option diagnosis
ARITHABORT mismatch, plan proliferation, the query that runs in 40ms from SSMS and 22 seconds from the app.
Who This Is For
▸ Engineering teams with a performance incident they cannot explain
▸ Companies running SQL Server without a dedicated DBA
▸ Organisations where queries worked fine until they didn't
▸ Teams preparing for a high-traffic event and needing a pre-flight review