An optional accelerator that lets a query jump straight to the rows it needs instead of scanning the whole table. Indexes never cause wrong answers, only faster ones. They are read by the DeltaForge planner; the parent data stays in standard Delta format.
A managed companion to a Delta table
For each indexed column, the index records where the matching rows live so the engine can read just those rows rather than the whole table. The index is itself a child Delta table.
Indexes are consumed by the DeltaForge query planner. Other engines reading the parent Delta table will not pick them up; they fall back to the standard scan path and return the same results.
Indexes complement built-in data skipping; they don't replace it
An index is a running expense, not a free upgrade
Storage
A small fraction of the parent table's size
Write overhead
Every parent write also updates the index when auto-update is on
Build time
One-time scan of the parent at index creation
Pick by access pattern
A learned index built from a piecewise geometric model over the key distribution. Compact on disk; suited to clustered or monotonic keys typical of analytical workloads.
Classic balanced tree with predictable behavior across any key distribution. Choose with USING btree when the data is unsorted or highly random.
File-level probabilistic test. Each Parquet file carries a bloom filter for the indexed columns; the planner skips files whose filter rejects the predicate. Tunable fpp and num_items.
Quick answers on indexing Delta Lake tables
Yes. DeltaForge creates PGM learned, B+ tree, and bloom filter indexes on Delta Lake tables. The index lives as a child Delta table next to the parent, and the parent stays in standard Delta format.
When the statement targets a specific row or a small set, yes. The locate step becomes a direct read instead of a scan, which is where slow MERGE workloads spend most of their time.
The planner silently ignores it and falls back to the standard scan path, so results stay correct. Optional auto-update keeps the index in sync on parent commits.
Indexes pay off most on write-heavy Delta workloads
The write commands an index accelerates, end to end in plain SQL.
The maintenance runbook that keeps file layout, and therefore pruning, healthy.
A MERGE-heavy pattern where fast row location matters on every load.
Documented in detail in the architecture reference.