The Partial Index That Fixed a Reporting Query Nobody

A partial index indexes a subset of rows — those satisfying a WHERE clause. For a table with 50 million rows where 99.5% are in a terminal state and 0.5% are in active processing, a conventional index on the status column is almost useless: the planner avoids it because the selectivity of "active" rows is too low to justify the seek overhead for most queries. A partial index on status = 'active' is 200x smaller and the planner can use it for every query against the active subset.

When a Partial Index Is the Right Tool

A partial index is worth considering when all of the following are true: the table is large (millions of rows), a significant fraction of rows will never appear in queries after they reach a certain state (completed orders, processed events, archived records), and the active minority is the performance-critical subset — the rows that every application query actually reads and writes.

In these situations, a conventional index on the state column includes every historical row. The index is large, the buffer pool fills with index pages for rows nobody queries, and cache efficiency for the active rows drops.

Diagnosing Whether a Partial Index Will Help

Measure the distribution of your indexed column values

PostgreSQL

-- How many rows are in each state?
SELECT
  status,
  count(*) AS row_count,
  round(100.0 * count(*) / sum(count(*)) OVER (), 2) AS pct
FROM   job_queue
GROUP BY status
ORDER BY row_count DESC;

-- Result that calls for a partial index:
-- status      | row_count | pct
-- ------------+-----------+--------
-- completed   | 49700000  | 99.40
-- failed      | 245000    |  0.49
-- active      | 55000     |  0.11
-- pending     | 1200      |  0.00

Check how large a conventional index on this column would be

PostgreSQL

-- Compare existing index size vs what a partial index would be:
SELECT
  indexrelname AS index_name,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM   pg_stat_user_indexes
WHERE  relname = 'job_queue'
ORDER BY pg_relation_size(indexrelid) DESC;

Create the partial index — only indexes the rows your queries actually touch

PostgreSQL

-- Partial index: only active and pending rows
-- The WHERE clause matches your most common query filter
CREATE INDEX CONCURRENTLY idx_job_queue_active
  ON job_queue (created_at, job_id)
  WHERE status IN ('active', 'pending');

-- CONCURRENTLY builds without taking an exclusive lock
-- Safe to run on a live table — no application downtime

-- How small is this index?
SELECT pg_size_pretty(pg_relation_size('idx_job_queue_active'));

Verifying the Planner Uses It

EXPLAIN ANALYZE confirms index usage on active rows

PostgreSQL

EXPLAIN (ANALYZE, BUFFERS)
SELECT job_id, payload, created_at
FROM   job_queue
WHERE  status = 'active'
ORDER BY created_at
LIMIT  100;

-- Expected output:
-- Index Scan using idx_job_queue_active on job_queue
--   (actual rows=87 loops=1)
--   Buffers: shared hit=12 read=3

-- The "shared hit=12 read=3" is the signal:
-- Only 15 buffer accesses to return 87 rows from a 50M row table
-- Without the partial index: Sequential Scan, 50M row reads

The partial index must match your query's WHERE clause exactly

PostgreSQL

-- This query WILL use the partial index:
SELECT * FROM job_queue WHERE status = 'active';
SELECT * FROM job_queue WHERE status = 'pending';

-- This query will NOT use it (OR condition not in partial index predicate):
SELECT * FROM job_queue WHERE status = 'active' OR status = 'completed';

-- This query will NOT use it (no status filter — can't restrict to partial index rows):
SELECT * FROM job_queue WHERE created_at > now() - INTERVAL '1 hour';

The Unique Partial Index Pattern

One of the most powerful uses of partial indexes is enforcing uniqueness only on the active rows — not the entire table. A soft-delete pattern (is_deleted = false) needs unique emails among active users, but allows multiple deleted records with the same email.

Unique constraint on active rows only

PostgreSQL

-- Unique email among non-deleted users only:
CREATE UNIQUE INDEX idx_users_email_active
  ON users (email)
  WHERE is_deleted = false;

-- This allows:
-- INSERT INTO users (email, is_deleted) VALUES ('a@b.com', true);
-- INSERT INTO users (email, is_deleted) VALUES ('a@b.com', true);
-- (two deleted records with same email — allowed)
-- 
-- But prevents:
-- INSERT INTO users (email, is_deleted) VALUES ('a@b.com', false);
-- INSERT INTO users (email, is_deleted) VALUES ('a@b.com', false);
-- (duplicate active email — ERROR: duplicate key)

Maintenance Considerations

Partial indexes maintain themselves automatically as rows are inserted and deleted. When a row's status changes from 'active' to 'completed', PostgreSQL removes it from the partial index automatically. There is no manual maintenance required.

Autovacuum handles dead tuple cleanup in partial indexes the same as regular indexes. REINDEX CONCURRENTLY is available if the index becomes bloated. The main consideration is that ANALYZE needs to see enough rows matching the partial index predicate to build an accurate histogram — on tables where the active set is very small (under 1,000 rows), the planner may occasionally make poor decisions due to statistics imprecision.

Marcus Reid

Staff DBA · PostgreSQL · 14 years · QueryTuning

Marcus has been managing large PostgreSQL deployments for 14 years. He writes about indexing strategy, query optimisation, and the solutions that make dramatic differences with minimal complexity.

The Partial Index That Fixed a Reporting Query Nobody Could Crack for Two Years

When a Partial Index Is the Right Tool

Diagnosing Whether a Partial Index Will Help

Verifying the Planner Uses It

The Unique Partial Index Pattern

Maintenance Considerations

The Same Problem on Other Engines