Read and write Delta and Iceberg in place. Serve BI directly from the lake, run graph workloads on the same data, and give agents typed catalog access.
All on your hardware. No second warehouse. No copy pipeline. No cluster tax.
Delta Lake and Apache Iceberg in one native engine, with first-class grammar for MERGE, time travel, deletion vectors, UniForm interop, and change data feed. Correctness is backed by two independent verification layers: 258 downloadable use cases with 10,500+ assertions you run on your own install, and 7,137 scenarios verified bi-directionally against Apache Spark. Expected values are derived outside the engine under test. Every result publishes schema, sample rows, and verification arithmetic.
A lakehouse should not bill you for clusters sitting on standby or for a second warehouse holding a copy of data you already own. DeltaForge is a single native engine on your own hardware. Compute is metered as core-seconds while a query runs, so an idle node costs nothing, and every job finishes in fewer of those seconds.
Compute is metered as core-seconds while a query actually runs. An idle DeltaForge node bills nothing. No per-row scan fees, no per-API charges, no minimum cluster uptime.
The VM is a normal cloud cost. But stateless, quick-starting workers fit the Kubernetes autoscaling behind your ingress: pods scale up when queries arrive, down when they stop. You pay for demand, not a warm cluster.
5x to 8x faster than Spark on standard reads and ~4x faster writing 10M rows to plain Delta. The same answer costs fewer core-seconds, so faster is also cheaper. See the numbers.
BI tools read the lake directly through ODBC and ADBC. No duplicate copy to move, no separate warehouse to license, govern, and keep in sync with the tables your pipelines already write.
Four standardized read suites and one synthetic-source write workload, run against the same plain-Delta fixtures by DeltaForge, DuckDB, Spark default, and Spark tuned. The harness, the data, the SQL, and the engine versions are all in the public repo. Queries where DeltaForge ties or loses are reported by name with the slowdown factor.
| Benchmark | DeltaForge | DuckDB | Spark default | Spark tuned | Detail |
|---|---|---|---|---|---|
| TPC-H // 22 queries, 8 tables | 255 ms | 173 ms | 1,478 ms | 1,528 ms | tpch.md |
| TPC-DS // 99 queries, 24 tables | 271 ms | 171 ms | 1,568 ms | 1,464 ms | tpcds.md |
| SSB // 13 queries, 5-table star | 191 ms | 75 ms | 685 ms | 628 ms | ssb.md |
| JOB // 113 queries, IMDB | 976 ms | 632 ms | crashed* | crashed | job.md |
Warm-median ms across the workload, SF=1. DuckDB wins reads by 1.5x-2.5x; DeltaForge beats both Spark profiles by 5x-8x on every read. *On JOB, Spark default's JVM crashed after q06d and Spark tuned failed to start. We don't publish a median for the partial Spark runs because the 21 completed queries are an unrepresentative early subset. Deltaforge and DuckDB completed all 113.
10,000,000-row CTAS into plain Delta from a deterministic synthetic source (generate_series on df, range on Spark; same nine-column schema, same row content). DeltaForge: 6.48M rows/sec · Spark default: 1.51M · Spark tuned: 1.61M. DuckDB sits this one out because its delta extension is read-only.
Reproducible in one command on your hardware. Apache 2.0. The harness, the data generators, the per-engine adapters, every query, and every methodology choice are public.
BI usually means copying the lake into a second SQL database. DeltaForge ships two drivers that point at Delta directly: a native ODBC driver for the entire ODBC ecosystem, and an ADBC driver for Delta Lake with Power BI Desktop integration. Same engine, same governance. Dashboards point at the tables your pipelines already write.
Fraud rings, supply chains, customer-to-merchant patterns usually mean a separate graph database with its own sync pipeline. DeltaForge projects your existing Delta tables into a zero-ETL native property graph in place. Model relationships like Customer → Transaction → Merchant, traverse with Cypher, score with PageRank, join the result back to SQL in the same session.
Code assistants and automation need typed access to the real catalog, not guesses from a prompt. DeltaForge ships with a built-in MCP server, the open Model Context Protocol used by Claude, Cursor, and GitHub Copilot Chat. Plug it in once, and the caller gets typed actions for catalog, lineage, SQL, and pipelines under the same RBAC and audit logging as a human user.
Run heavy Delta and Iceberg SQL with proof it works. Serve Power BI, Tableau, and Excel directly through a smart-cache ODBC driver. Traverse your tables as a native property graph with Cypher and PageRank. Wire the catalog into Claude, Cursor, and Copilot through MCP. All on your infrastructure, all in one native execution platform.