Wednesday, 10:15. CPU on the production Oracle 12c RAC node hit 100% and did not come down. The load average climbed to 340 on a 64-core machine. No new deployments. No schema changes. The only change: a developer had added /*+ RESULT_CACHE */ to a frequently-executed dashboard query the previous evening.
The Alert
SELECT event, total_waits, time_waited_micro / 1000000 AS time_waited_sec FROM v$system_event WHERE wait_class != 'Idle' ORDER BY time_waited_micro DESC FETCH FIRST 10 ROWS ONLY;
Top wait event: latch: shared pool. But the second entry caught our attention: latch free with a child latch name of Result Cache: RC Latch. Time waited: climbing by thousands of seconds per minute. We checked three times in 30-second intervals — the RC Latch time_waited increased by 2,400 seconds each interval. Every CPU core was spinning on this single latch.
First Hypothesis: Shared Pool Contention
The shared pool latch appeared first, so we investigated shared pool sizing and fragmentation.
SELECT name, bytes / 1024 / 1024 AS mb FROM v$sgastat WHERE pool = 'shared pool' AND name IN ('free memory', 'Result Cache'); SELECT id, type, status, name, cache_id, ROUND(space / 1024) AS space_kb FROM v$result_cache_objects ORDER BY space DESC FETCH FIRST 10 ROWS ONLY;
Shared pool free memory: 1.2GB — plenty. Result Cache: 384MB allocated, 380MB used. The result cache was nearly full, but that should cause cache evictions, not latch storms. We ran both queries three times: shared pool free was stable, result cache usage was stable. No growth, no fragmentation. The shared pool was not the problem.
The Discovery
The dashboard query with the RESULT_CACHE hint was executed 200 times per second — once per dashboard refresh across 200 concurrent users. Every execution acquired the RC Latch to look up or store the cached result. In Oracle 12c, the Result Cache uses a single latch for all operations — lookups, inserts, invalidations. Under 200 concurrent accesses, every session waited for the latch, spinning CPU while waiting.
SELECT s.sid, s.serial#, s.username, s.program, s.event, s.seconds_in_wait, SUBSTR(sq.sql_text, 1, 100) AS sql_text FROM v$session s LEFT JOIN v$sql sq ON sq.sql_id = s.sql_id WHERE s.event LIKE '%Result Cache%' ORDER BY s.seconds_in_wait DESC;
197 sessions. All waiting on the RC Latch. All executing the same dashboard query. We verified three times over 90 seconds — 195, 201, 198 sessions each time. The latch was completely serializing what should have been parallel cache lookups.
Incident Timeline
| Time | Event |
|---|---|
| Tue 21:00 | Developer adds RESULT_CACHE hint to dashboard query |
| Wed 10:00 | Dashboard usage ramps up — 200 concurrent sessions |
| Wed 10:15 | CPU hits 100%. Load average: 340. |
| Wed 10:40 | RC Latch identified as root cause |
| Wed 10:42 | Result cache disabled. CPU drops to 18% in 30 seconds. |
Root Cause — Product Bug
This was a known architectural limitation in Oracle 12c's Result Cache implementation, documented in multiple Oracle support notes. The RC Latch is a single exclusive latch that serializes all Result Cache operations — lookups included, not just writes. Under high concurrency (50+ sessions hitting the same cached result), the latch becomes a bottleneck that consumes all available CPU in spin-wait loops. Oracle addressed this partially in 19c with finer-grained latching, but 12c has no fix — only the workaround of disabling or limiting the Result Cache.
The Fix
-- Disable result cache immediately ALTER SYSTEM SET result_cache_max_size = 0 SCOPE = BOTH; -- Remove the hint from the query via SQL Plan Baseline -- (avoids waiting for a code deployment) DECLARE l_plans PLS_INTEGER; BEGIN l_plans := DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE( sql_id => 'abc123def', fixed => 'YES'); END; /
Prevention
The RESULT_CACHE hint is now on the prohibited list for any query executed more than 10 times per second. Application-level caching (Redis) handles high-frequency repeated queries. On Oracle 12c, result_cache_max_size is set to 0 in production. If we move to 19c, we will re-evaluate with testing under production concurrency levels before enabling.
The root failure here was a well-intentioned optimization applied without concurrency testing. The developer ran the query with RESULT_CACHE in a test environment with one user — response time dropped from 3 seconds to 0.02 seconds. A 150x improvement. The test was technically correct. What was missing was a concurrent load test with 200 sessions hitting the same cached result. Under single-user access, the RC Latch is never contended. Under 200-way concurrency, it becomes the bottleneck.
We now require that any optimizer hint or parameter change passes a concurrency stress test before production deployment. The test runs 100 concurrent sessions executing the modified query for 60 seconds and measures both throughput and latch/wait event profiles. A new latch appearing in the top-5 wait events is an automatic rejection, regardless of single-session performance improvement. This test would have caught the RC Latch issue in minutes rather than discovering it in production at 10:15 AM with 200 users affected. for any query executed more than 10 times per second. Application-level caching (Redis) handles high-frequency repeated queries. On Oracle 12c, result_cache_max_size is set to 0 in production. If we move to 19c, we will re-evaluate with testing under production concurrency levels before enabling.