Most lakehouse jobs don't need a
cluster. They need a native engine.

Read and write Delta and Iceberg in place. Serve BI directly from the lake, run graph workloads on the same data, and give agents typed catalog access.

All on your hardware. No second warehouse. No copy pipeline. No cluster tax.

Get Community License View on GitHub

Create a free account and claim a Community license No credit card required Download for Windows, macOS, and Linux

Lakehouse engine

Read and write Delta and Iceberg
with proofs you can run

Delta Lake and Apache Iceberg in one native engine, with first-class grammar for MERGE, time travel, deletion vectors, UniForm interop, and change data feed. Correctness is not claimed, it is checked: two independent verification layers, with every expected value derived outside the engine under test.

Downloadable use cases you run on your own install, every failure self-diagnosing

Bi-directional conformance against Apache Spark, with results open for inspection

No self-validation: every result publishes its schema, sample rows, and verification arithmetic

Browse use cases Conformance suite

258

End-to-end use cases

10,500+

Machine-checked assertions

7,137

Scenarios verified in both directions

100%

Expected values derived outside the engine

Cost model

Pay only for the work
you actually run

DeltaForge is one native engine on hardware you already pay for. Compute is metered as core-seconds while a query runs: an idle node costs nothing, and a faster engine spends fewer of them.

$0 idle

Pay for queries, not standby

Compute is metered while a query actually runs. An idle node bills nothing: no per-row scan fees, no per-API charges, no minimum cluster uptime.

Autoscale

Scale with load

Stateless, quick-starting workers fit ordinary Kubernetes autoscaling: pods scale up when queries arrive and back down when they stop.

5-8x

Faster is cheaper

5x to 8x faster than Spark on standard reads, ~4x on writes. The same answer costs fewer core-seconds. See the numbers.

0 copies

No second warehouse

BI tools read the lake directly through ODBC and ADBC. No duplicate copy to move, license, govern, or keep in sync.

The cost model in full See pricing

Benchmarks

Honest numbers,
same tables, same hardware

Four standardized read suites and one write workload, run against the same plain-Delta fixtures by DeltaForge, DuckDB, and two Spark profiles. DuckDB wins reads by 1.5x to 2.5x; DeltaForge beats both Spark profiles by 5x to 8x on every suite. The whole harness is Apache 2.0, reproducible in one command, and every tie or loss is reported by name.

Benchmark	DeltaForge	DuckDB	Spark default	Spark tuned	Detail
TPC-H 22 queries, 8 tables	255 ms	173 ms	1,478 ms	1,528 ms	tpch.md
TPC-DS 99 queries, 24 tables	271 ms	171 ms	1,568 ms	1,464 ms	tpcds.md
SSB 13 queries, 5-table star	191 ms	75 ms	685 ms	628 ms	ssb.md
JOB 113 queries, IMDB	976 ms	632 ms	crashed^*	crashed	job.md

Warm-median ms per suite at SF=1, lower is better. ^*Spark default crashed after q06d on JOB and Spark tuned failed to start; no median is published for partial runs. DeltaForge and DuckDB completed all 113 queries.

Writes: ~4x faster than Spark on single-node Delta

10,000,000-row CTAS into plain Delta from a deterministic synthetic source, same nine-column schema and row content. DeltaForge: 6.48M rows/sec · Spark default: 1.51M · Spark tuned: 1.61M. DuckDB sits this one out because its delta extension is read-only. Full write benchmark.

Benchmark repo on GitHub Read the methodology

BI drivers

Power BI, Tableau, Excel
directly on your lake

BI usually means copying the lake into a second SQL database first. DeltaForge ships two drivers that point at Delta directly: a native ODBC driver for the whole ODBC ecosystem, and an ADBC driver with a bundled Power Query connector for Power BI Desktop. Same engine, same governance, no copy.

Native ODBC for Power BI, Tableau, Excel, .NET, Python, and R, with full read and write

ADBC for Power BI Desktop reads Arrow batches by reference: 13.7x faster than .NET ODBC on a 1M-row scan

No second warehouse: dashboards point at the tables your pipelines already write

ADBC driver ODBC driver

Property graph

Your Delta tables
become native property graphs

Fraud rings, supply chains, and customer-to-merchant patterns usually mean a separate graph database and a sync pipeline to feed it. DeltaForge projects your existing Delta tables into a native property graph in place: traverse it with Cypher, score it with PageRank, and join the results back to SQL in the same session.

Zero-ETL projection over the tables you already have: no sidecar store, no copy job

Cypher plus 32 algorithms, 18 GPU-accelerated: PageRank, Leiden, shortest paths, embeddings

Graph and SQL in one session, so scores land next to the dimensions they explain

Explore the graph engine Cypher reference

AI agents

Typed catalog access
for code assistants and agents

Assistants and automation need typed access to the real catalog, not guesses from a prompt. DeltaForge ships a built-in MCP server: plug it in once and Claude, Cursor, or Copilot get typed actions for catalog, lineage, SQL, and pipelines, under the same RBAC and audit logging as a human user.

Real parser feedback: the assistant fixes its own SQL before you ever see it

Typed actions across catalog, lineage, SQL, pipelines, and docs

Same RBAC as humans: scoped tokens, audited tool calls, runs in your network

Explore the MCP server Install

Four real-world workloads
one native engine

Heavy Delta and Iceberg SQL with proof it works. BI served straight from the lake. Cypher and graph algorithms over the same tables. Agents wired into the catalog through MCP. One native engine, on your infrastructure.

Get Community License Read the Docs

Most lakehouse jobs don't need a cluster. They need a native engine.

Read and write Delta and Icebergwith proofs you can run

Pay only for the workyou actually run