Skip to content
Indexing

Row-Level Indexes for Delta Tables

An optional accelerator that lets a query jump straight to the rows it needs instead of scanning the whole table. Indexes never cause wrong answers, only faster ones. They are read by the DeltaForge planner; the parent data stays in standard Delta format.

PGM learned index and B+ tree algorithms
Bloom filter indexes for file pruning
Optional auto-update on parent commits
Parent table stays in standard Delta format
[100|500] [20|60] [200|350] [600|900] 5..18 22..58 61..198 201..348 352..595 601..950 Index Lifecycle building initial scan current in sync, used by planner stale ignored

What It Is

A managed companion to a Delta table

The Index

For each indexed column, the index records where the matching rows live so the engine can read just those rows rather than the whole table. The index is itself a child Delta table.

  • Optional. Tables work without indexes.
  • Same answers either way, just less data read.
  • Stale index is silently ignored.
  • Parent data stays in standard Delta format.

Reader Scope

Indexes are consumed by the DeltaForge query planner. Other engines reading the parent Delta table will not pick them up; they fall back to the standard scan path and return the same results.

  • DeltaForge planner uses the index.
  • External readers ignore it.
  • Correctness preserved across both paths.
  • No fork, no proprietary table format.

When To Reach For One

Indexes complement built-in data skipping; they don't replace it

Helps

  • Selective reads on a hot column: point lookups, narrow ranges, IN lists, prefix matches.
  • UPDATE, DELETE, MERGE on a specific row or small set; the locate step becomes a direct read.
  • Repeated equality joins on the same key when one side carries an index on that column.

Does Not Help

  • The predicate matches most of the table anyway.
  • The engine already prunes the column well from its built-in min/max statistics.
  • The table is small enough that scanning everything is already cheap.

Cost

An index is a running expense, not a free upgrade

Storage A small fraction of the parent table's size
Write overhead Every parent write also updates the index when auto-update is on
Build time One-time scan of the parent at index creation

Index Structures

Pick by access pattern

PGM (default)

A learned index built from a piecewise geometric model over the key distribution. Compact on disk; suited to clustered or monotonic keys typical of analytical workloads.

B+ Tree

Classic balanced tree with predictable behavior across any key distribution. Choose with USING btree when the data is unsorted or highly random.

Bloom Filter

File-level probabilistic test. Each Parquet file carries a bloom filter for the indexed columns; the planner skips files whose filter rejects the predicate. Tunable fpp and num_items.

Use them where they pay off

Documented in detail in the architecture reference.