DeltaForge reads and writes Delta Lake tables directly: ACID transactions, deletion vectors, change data feed, time travel via Delta versions, and schema evolution, all without a Spark cluster.
The features most production workloads depend on, implemented natively
Every write commits atomically via the transaction log. Concurrent writers use optimistic concurrency: conflicts are detected at commit time, not upfront. Readers never see partial writes.
DELETE and MERGE record deleted row positions in compact bitmaps rather than rewriting Parquet files. Readers apply the bitmap during scan. VACUUM permanently removes files when retention allows.
Query any committed version by number or timestamp. Useful for audit reconstruction, incremental pipelines, and reverting accidental changes.
SELECT * FROM events VERSION AS OF 42
Track every row-level change (insert, update pre/post image, delete) across a version or timestamp range. Useful for incremental ETL, cache invalidation, and audit trails.
SELECT * FROM table_changes('customers', 100, 150)
Change table structure without rewriting data files or breaking downstream readers
Add new columns at any position. Existing files return NULL for the new columns automatically.
ALTER TABLE t ADD COLUMN region STRINGRename columns using column ID tracking. Zero data movement: existing files are not touched.
ALTER TABLE t RENAME COLUMN old_name TO new_nameWiden column types (int to bigint, float to double, decimal precision increase) without rewriting data.
ALTER TABLE t ALTER COLUMN amount TYPE DECIMAL(20,4)Add fields to struct types and evolve map and array schemas at any nesting depth.
ALTER TABLE t ADD COLUMN addr STRUCT<zip: STRING, city: STRING>Keep tables fast and storage efficient with built-in maintenance operations
Compact small files into larger ones to reduce per-query file-open overhead. Supports predicate-scoped compaction to limit the write amplification.
OPTIMIZE events WHERE date > '2024-01-01'Co-locate related data across multiple columns using space-filling curves. Improves data skipping for multi-dimensional filter predicates.
OPTIMIZE events ZORDER BY (user_id, event_type)Remove Parquet files no longer referenced by any live version. A DRY RUN mode shows which files would be deleted before committing the cleanup.
VACUUM events RETAIN 168 HOURSCompute column-level statistics (min, max, null count, histograms) used by the cost-based optimizer to skip files and reorder joins.
ANALYZE TABLE events COMPUTE STATISTICS FOR ALL COLUMNSWrite once as Delta, expose as Iceberg with no data duplication
Enable UniForm and DeltaForge generates Iceberg metadata (metadata.json, manifest list, manifests) alongside the Delta transaction log on every commit. The same physical Parquet files are referenced by both metadata layers.
Delta readers see the Delta log. Iceberg-compatible engines see the Iceberg metadata. No ETL pipeline, no second copy of data, no synchronization lag.
ALTER TABLE events SET TBLPROPERTIES ('delta.universalFormat.enabledFormats' = 'iceberg')ACID transactions, deletion vectors, change data feed, time travel, and Iceberg interoperability on open Delta Lake tables.