Database Indexing

The single most impactful thing to understand for database performance. An index is a precomputed shortcut that lets the database find matching rows without scanning every row in the table.

The library analogy

Imagine a library with a million books stacked in no order.

No index → to find “all books by Dostoevsky”, you read the title page of every book. One million lookups. This is a table scan.
With an index → there’s a separate card catalog sorted by author. You flip to “D”, find Dostoevsky’s cards, each tells you exactly which shelf to go to. Maybe a dozen lookups total.

That’s what a database index is: a separate, sorted data structure that points back to the actual rows.

What an index actually is

Most databases use B-tree (balanced tree) indexes by default. A B-tree keeps keys sorted in a tree structure where each node has many children (hundreds is typical).

                     [M]
                  /        \
             [F]            [T]
           /     \        /     \
        [B,D]  [H,J,L] [O,R]  [U,W,Y]

Properties:

O(log n) search, insert, delete — so a billion-row table still takes just ~30 steps to find a key
Leaves are linked, so range queries (WHERE age BETWEEN 20 AND 30) walk the leaves sequentially
Naturally sorted, so ORDER BY can sometimes skip sorting entirely

Other index types exist for special workloads:

Hash — exact match only (no range queries); faster than B-tree for = lookups
GIN (PostgreSQL) — inverted index for arrays, JSON, full-text search
GiST — geospatial, range types
BRIN — block-range, great for huge time-series tables with natural ordering

What gets indexed (and what doesn’t)

By default:

Primary key → always indexed (automatically)
Unique constraints → automatically indexed
Foreign keys → often not automatically indexed (depends on the DB) — a common source of slow queries

You add indexes manually for:

Columns you filter by in WHERE clauses
Columns you join on
Columns you sort by in ORDER BY

The four levels of query efficiency

Learning to read query plans (EXPLAIN in most databases) is the key skill. From worst to best:

Plan	What it means	When OK
Sequential scan	Read every row	Small tables, or when most rows match
Index scan	Use index to find rows, then read each row	Selective queries
Index-only scan	All needed columns are in the index — no table access	Carefully designed covering indexes
Index skip/range	Walk a small portion of the index	Range queries on indexed columns

A sequential scan over a billion-row table is the classic “my query is slow” root cause.

The cost of indexes

Indexes aren’t free:

Disk space — a B-tree index on a 1 TB table can easily be 100 GB. Multiply by N indexes.
Write slowdown — every INSERT / UPDATE / DELETE has to update every index on the table.
Cache pressure — indexes compete with data for RAM.

Rule of thumb: every index helps some queries and hurts all writes. Only add an index if you can point to the query it accelerates.

Composite indexes

An index on (country, city, age) — the order matters.

This index accelerates:

WHERE country = 'Georgia' ✓
WHERE country = 'Georgia' AND city = 'Tbilisi' ✓
WHERE country = 'Georgia' AND city = 'Tbilisi' AND age = 30 ✓

It does not accelerate:

WHERE city = 'Tbilisi' ✗ (can’t use the index; wrong leading column)
WHERE age = 30 ✗

Leftmost-prefix rule: a composite index on (a, b, c) helps queries that filter on a, or a+b, or a+b+c — not on b alone or c alone.

Covering indexes / “include” columns

An index that contains all the columns a query needs is a covering index → the database answers the query entirely from the index, never touching the table. Dramatic speedup.

CREATE INDEX ON orders (customer_id) INCLUDE (total, order_date);
-- Query: SELECT total, order_date FROM orders WHERE customer_id = 42;
-- Reads just the index. The table is untouched.

When indexes don’t help

Low-selectivity columns — an index on “is_active” (boolean) is usually worse than a table scan, because every row roughly matches anyway
Functions on indexed columns — WHERE LOWER(email) = '...' can’t use an index on email unless you create a functional index on LOWER(email)
Leading wildcards — WHERE name LIKE '%foo%' can’t use a standard index; needs full-text search
Implicit type conversion — WHERE id = '42' where id is numeric can skip the index

How to find missing indexes

Three practical techniques:

Slow query log — enable it, watch for queries that take >100 ms (or whatever your threshold is). Look at the top 10 offenders.
EXPLAIN / EXPLAIN ANALYZE — run on suspect queries. Find the sequential scans.
Index advisors — pg_stat_statements + extensions in Postgres, Query Store in SQL Server, Performance Schema in MySQL — all can surface the queries that would benefit most.

How to find useless indexes

pg_stat_user_indexes in Postgres shows “how many times this index was actually used.” Zero uses over weeks → probably a candidate for removal.
Duplicate indexes (same columns, same order) exist surprisingly often in legacy systems. Drop them.
Indexes on columns that are never queried. Drop them.

IT Knowledge DB

Explorer

Database Indexing

Database Indexing

The library analogy

What an index actually is

What gets indexed (and what doesn’t)

The four levels of query efficiency

The cost of indexes

Composite indexes

Covering indexes / “include” columns

When indexes don’t help

How to find missing indexes

How to find useless indexes

See also

Graph View

Table of Contents

Backlinks