Result Cache Caused a 45-Minute Latch Storm

Wednesday, 10:15. CPU on the production Oracle 12c RAC node hit 100% and did not come down. The load average climbed to 340 on a 64-core machine. No new deployments. No schema changes. The only change: a developer had added /*+ RESULT_CACHE */ to a frequently-executed dashboard query the previous evening.

The Alert

Find the top wait events

Oracle

SELECT event, total_waits, time_waited_micro / 1000000 AS time_waited_sec
FROM v$system_event
WHERE wait_class != 'Idle'
ORDER BY time_waited_micro DESC
FETCH FIRST 10 ROWS ONLY;

Top wait event: latch: shared pool. But the second entry caught our attention: latch free with a child latch name of Result Cache: RC Latch. Time waited: climbing by thousands of seconds per minute. We checked three times in 30-second intervals — the RC Latch time_waited increased by 2,400 seconds each interval. Every CPU core was spinning on this single latch.

First Hypothesis: Shared Pool Contention

The shared pool latch appeared first, so we investigated shared pool sizing and fragmentation.

Check shared pool and result cache stats

Oracle

SELECT name, bytes / 1024 / 1024 AS mb
FROM v$sgastat
WHERE pool = 'shared pool' AND name IN ('free memory', 'Result Cache');

SELECT id, type, status, name, cache_id,
  ROUND(space / 1024) AS space_kb
FROM v$result_cache_objects
ORDER BY space DESC
FETCH FIRST 10 ROWS ONLY;

Shared pool free memory: 1.2GB — plenty. Result Cache: 384MB allocated, 380MB used. The result cache was nearly full, but that should cause cache evictions, not latch storms. We ran both queries three times: shared pool free was stable, result cache usage was stable. No growth, no fragmentation. The shared pool was not the problem.

The Discovery

The dashboard query with the RESULT_CACHE hint was executed 200 times per second — once per dashboard refresh across 200 concurrent users. Every execution acquired the RC Latch to look up or store the cached result. In Oracle 12c, the Result Cache uses a single latch for all operations — lookups, inserts, invalidations. Under 200 concurrent accesses, every session waited for the latch, spinning CPU while waiting.

Find sessions waiting on the RC Latch

Oracle

SELECT
  s.sid, s.serial#, s.username, s.program,
  s.event, s.seconds_in_wait,
  SUBSTR(sq.sql_text, 1, 100) AS sql_text
FROM v$session s
LEFT JOIN v$sql sq ON sq.sql_id = s.sql_id
WHERE s.event LIKE '%Result Cache%'
ORDER BY s.seconds_in_wait DESC;

197 sessions. All waiting on the RC Latch. All executing the same dashboard query. We verified three times over 90 seconds — 195, 201, 198 sessions each time. The latch was completely serializing what should have been parallel cache lookups.

Incident Timeline

Time	Event
Tue 21:00	Developer adds RESULT_CACHE hint to dashboard query
Wed 10:00	Dashboard usage ramps up — 200 concurrent sessions
Wed 10:15	CPU hits 100%. Load average: 340.
Wed 10:40	RC Latch identified as root cause
Wed 10:42	Result cache disabled. CPU drops to 18% in 30 seconds.

Root Cause — Product Bug

This was a known architectural limitation in Oracle 12c's Result Cache implementation, documented in multiple Oracle support notes. The RC Latch is a single exclusive latch that serializes all Result Cache operations — lookups included, not just writes. Under high concurrency (50+ sessions hitting the same cached result), the latch becomes a bottleneck that consumes all available CPU in spin-wait loops. Oracle addressed this partially in 19c with finer-grained latching, but 12c has no fix — only the workaround of disabling or limiting the Result Cache.

The Fix

Disable result cache and remove the hint

Oracle

-- Disable result cache immediately
ALTER SYSTEM SET result_cache_max_size = 0 SCOPE = BOTH;

-- Remove the hint from the query via SQL Plan Baseline
-- (avoids waiting for a code deployment)
DECLARE
  l_plans PLS_INTEGER;
BEGIN
  l_plans := DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE(
    sql_id => 'abc123def', fixed => 'YES');
END;
/

Prevention

The RESULT_CACHE hint is now on the prohibited list for any query executed more than 10 times per second. Application-level caching (Redis) handles high-frequency repeated queries. On Oracle 12c, result_cache_max_size is set to 0 in production. If we move to 19c, we will re-evaluate with testing under production concurrency levels before enabling.

The root failure here was a well-intentioned optimization applied without concurrency testing. The developer ran the query with RESULT_CACHE in a test environment with one user — response time dropped from 3 seconds to 0.02 seconds. A 150x improvement. The test was technically correct. What was missing was a concurrent load test with 200 sessions hitting the same cached result. Under single-user access, the RC Latch is never contended. Under 200-way concurrency, it becomes the bottleneck.

We now require that any optimizer hint or parameter change passes a concurrency stress test before production deployment. The test runs 100 concurrent sessions executing the modified query for 60 seconds and measures both throughput and latch/wait event profiles. A new latch appearing in the top-5 wait events is an automatic rejection, regardless of single-session performance improvement. This test would have caught the RC Latch issue in minutes rather than discovering it in production at 10:15 AM with 200 users affected. for any query executed more than 10 times per second. Application-level caching (Redis) handles high-frequency repeated queries. On Oracle 12c, result_cache_max_size is set to 0 in production. If we move to 19c, we will re-evaluate with testing under production concurrency levels before enabling.

Result Cache Turned a 3-Second Query into a 45-Minute Latch Storm

The Alert

First Hypothesis: Shared Pool Contention

The Discovery

Incident Timeline

Root Cause — Product Bug

The Fix

Prevention

The Same Problem on Other Engines