An analyst ran a report against a 2TB fact table. The table had DEGREE DEFAULT set on it. Oracle decided the optimal degree of parallelism was 64 — one parallel slave per CPU core. The analyst had 64 CPU cores. Within 90 seconds, every OLTP query on the system was waiting. CPU was at 100% across all 64 cores. Every new connection took 40+ seconds to get a response.
The Alert
Application monitoring: response times across all services at 40+ seconds. CPU at 100% on all 64 cores. The DBA connected and found 65 Oracle processes consuming CPU — one coordinator and 64 parallel query slaves, all from one user's session running one report.
The Discovery
-- Find the coordinator session (the one that started the parallel query): SELECT s.sid, s.serial#, s.username, s.sql_id, pq.server_set, pq.degree, pq.server_cnt FROM v$session s JOIN v$px_session pq ON pq.sid = s.sid AND pq.qcinst_id = s.inst_id WHERE pq.qcsid = pq.sid -- coordinator only, not slaves ORDER BY pq.degree DESC; -- Kill the coordinator (also kills all slaves automatically): ALTER SYSTEM KILL SESSION '[sid],[serial#]' IMMEDIATE;
The coordinator was killed. All 64 parallel slaves terminated within 2 seconds. CPU dropped from 100% to 8%. OLTP response times returned to normal within 30 seconds.
Incident Timeline
| Time | Event |
|---|---|
| 14:33 | Analyst runs report against fact_sales table (DEGREE DEFAULT) |
| 14:33 | Oracle chooses DOP 64. 64 parallel slaves start. All 64 CPU cores consumed. |
| 14:34:30 | OLTP queries begin queuing for CPU. Response times: 40+ seconds. |
| 14:35 | Application monitoring fires. Multiple alerts simultaneously. |
| 14:36 | DBA connects. Parallel coordinator identified. |
| 14:37 | ALTER SYSTEM KILL SESSION. 64 slaves terminate. |
| 14:38 | OLTP response times return to normal. |
| 14:40 | DEGREE DEFAULT removed from fact_sales. Resource plan activated. |
Root Cause
DEGREE DEFAULT tells Oracle to use its calculated optimal degree of parallelism for the table, which scales with CPU count. On a 64-core server, DEFAULT means 64. A single query consuming all CPU cores starves every other session of execution time. There was no Resource Manager plan limiting parallel query CPU usage for non-privileged users.
The Fix
-- Remove DEFAULT degree from the table: ALTER TABLE fact_sales PARALLEL 4; -- explicit max DOP -- Create a Resource Manager plan to cap analyst CPU at 20%: BEGIN dbms_resource_manager.create_plan( plan => 'ANALYST_CAP', comment => 'Cap analyst parallel queries at 20pct CPU'); dbms_resource_manager.create_plan_directive( plan => 'ANALYST_CAP', group_or_subplan => 'ANALYST_GROUP', comment => 'Analysts limited to 20pct CPU', cpu_p1 => 20, parallel_degree_limit_p1 => 8); -- max 8 parallel slaves END; ALTER SYSTEM SET resource_manager_plan = 'ANALYST_CAP';
Prevention
DEGREE DEFAULT is prohibited on any table accessed by analytical queries. All fact tables have explicit parallel degree set to 4 or 8. A Resource Manager plan is active that caps any non-DBA user to 20% CPU and a maximum DOP of 8. Any query requesting DOP above 8 is automatically downgraded.
Identifying and Assessing Active Parallel Queries
Before killing any session, measure the scope. On a busy system, multiple parallel queries may be running. Kill the coordinator — not the slaves individually. Killing individual slaves without killing the coordinator causes the coordinator to restart them.
SELECT qc.sid AS coordinator_sid, qc.serial# AS coordinator_serial, qc.username, qc.sql_id, COUNT(ps.sid) AS slave_count, MAX(qc.last_call_et) AS elapsed_seconds, qc.status FROM v$session qc JOIN v$px_session ps ON ps.qcsid = qc.sid WHERE qc.sid = ps.qcsid -- coordinators only GROUP BY qc.sid, qc.serial#, qc.username, qc.sql_id, qc.last_call_et, qc.status ORDER BY elapsed_seconds DESC;
SELECT owner, table_name, degree, cache, row_movement FROM dba_tables WHERE degree <> '1' ORDER BY owner, table_name; -- degree = 'DEFAULT' means Oracle chooses based on CPU count -- degree = '0' means NOPARALLEL -- degree = '64' means explicitly set to 64 -- Any value > 8 on a shared system is a risk unless Resource Manager limits it