Skip to content

Most lakehouse jobs dont need a
cluster. They need a native engine.

Read and write Delta and Iceberg in place. Serve BI directly from the lake, run graph workloads on the same data, and give agents typed catalog access.

All on your hardware. No second warehouse. No copy pipeline. No cluster tax.

Create a free account and claim a Community license No credit card required
Lakehouse SQL

Read and write Delta and Iceberg
with proofs you can run

Delta Lake and Apache Iceberg in one native engine, with first-class grammar for MERGE, time travel, deletion vectors, UniForm interop, and change data feed. Correctness is backed by two independent verification layers: 258 downloadable use cases with 10,500+ assertions you run on your own install, and 7,137 scenarios verified bi-directionally against Apache Spark. Expected values are derived outside the engine under test. Every result publishes schema, sample rows, and verification arithmetic.

258 downloadable use cases, runnable on your install, every failure self-diagnosing
7,137 bi-directional conformance scenarios verified against Apache Spark, results open for inspection
Expected values derived outside the engine, so no test passes by self-validation
INDEPENDENT VERIFICATION two engines · identical answers DELTAFORGE APACHE SPARK ═ MATCH ═ uniform_v3_roundtrip 9840.50 9840.50 merge_dv_concurrent 14,823 rows 14,823 rows time_travel_audit v6 snapshot v6 snapshot cdf_incremental_etl 4 batches 4 batches deletion_vectors 412 deleted 412 deleted expected values derived outside the engine
258
End-to-end use cases
10,500+
Machine-checked assertions
7,137
Scenarios verified in both directions
100%
Expected values derived outside the engine
View the full conformance matrix
Operational efficiency

Pay only for the work
you actually run

A lakehouse should not bill you for clusters sitting on standby or for a second warehouse holding a copy of data you already own. DeltaForge is a single native engine on your own hardware. Compute is metered as core-seconds while a query runs, so an idle node costs nothing, and every job finishes in fewer of those seconds.

$0 idle

Pay only for what you use

Compute is metered as core-seconds while a query actually runs. An idle DeltaForge node bills nothing. No per-row scan fees, no per-API charges, no minimum cluster uptime.

Autoscale

Scale with load, not standby

The VM is a normal cloud cost. But stateless, quick-starting workers fit the Kubernetes autoscaling behind your ingress: pods scale up when queries arrive, down when they stop. You pay for demand, not a warm cluster.

5-8x

Every query finishes sooner

5x to 8x faster than Spark on standard reads and ~4x faster writing 10M rows to plain Delta. The same answer costs fewer core-seconds, so faster is also cheaper. See the numbers.

0 copies

No second warehouse

BI tools read the lake directly through ODBC and ADBC. No duplicate copy to move, no separate warehouse to license, govern, and keep in sync with the tables your pipelines already write.

Benchmarks

Honest numbers,
same Delta tables, same hardware

Four standardized read suites and one synthetic-source write workload, run against the same plain-Delta fixtures by DeltaForge, DuckDB, Spark default, and Spark tuned. The harness, the data, the SQL, and the engine versions are all in the public repo. Queries where DeltaForge ties or loses are reported by name with the slowdown factor.

Benchmark DeltaForge DuckDB Spark default Spark tuned Detail
TPC-H // 22 queries, 8 tables 255 ms 173 ms 1,478 ms 1,528 ms tpch.md
TPC-DS // 99 queries, 24 tables 271 ms 171 ms 1,568 ms 1,464 ms tpcds.md
SSB // 13 queries, 5-table star 191 ms 75 ms 685 ms 628 ms ssb.md
JOB // 113 queries, IMDB 976 ms 632 ms crashed* crashed job.md

Warm-median ms across the workload, SF=1. DuckDB wins reads by 1.5x-2.5x; DeltaForge beats both Spark profiles by 5x-8x on every read. *On JOB, Spark default's JVM crashed after q06d and Spark tuned failed to start. We don't publish a median for the partial Spark runs because the 21 completed queries are an unrepresentative early subset. Deltaforge and DuckDB completed all 113.

Writes: DeltaForge is ~4x faster than Spark on single-node Delta

10,000,000-row CTAS into plain Delta from a deterministic synthetic source (generate_series on df, range on Spark; same nine-column schema, same row content). DeltaForge: 6.48M rows/sec · Spark default: 1.51M · Spark tuned: 1.61M. DuckDB sits this one out because its delta extension is read-only.

Full write-bench page on GitHub →

Reproducible in one command on your hardware. Apache 2.0. The harness, the data generators, the per-engine adapters, every query, and every methodology choice are public.

BI on the lake

Power BI, Tableau, Excel
directly on your lake

BI usually means copying the lake into a second SQL database. DeltaForge ships two drivers that point at Delta directly: a native ODBC driver for the entire ODBC ecosystem, and an ADBC driver for Delta Lake with Power BI Desktop integration. Same engine, same governance. Dashboards point at the tables your pipelines already write.

ODBC driver for Power BI, Tableau, Excel, .NET, Python, R. Full read and write through the standard Driver Manager, not a bespoke connector. Metadata cache resident in the driver
ADBC driver + Power Query connector. Power BI Desktop 2.145.1105.0+ reads Apache Arrow batches by reference. 13.7x faster than .NET ODBC on a 1M-row scan (measured)
No second warehouse, no duplicate copy. Dashboards point at the lake your pipelines already write to
DRIVER CACHE information_schema, resident
Graph traversal

Your Delta tables
become native property graphs

Fraud rings, supply chains, customer-to-merchant patterns usually mean a separate graph database with its own sync pipeline. DeltaForge projects your existing Delta tables into a zero-ETL native property graph in place. Model relationships like Customer → Transaction → Merchant, traverse with Cypher, score with PageRank, join the result back to SQL in the same session.

Zero-ETL graph projection on your existing Delta tables, with no sidecar graph store and no copy job
Native property graphs with Cypher plus 32 algorithms (PageRank, Leiden, Bellman-Ford, FastRP embeddings, K-core...) with 18 of them GPU-accelerated
Join graph results back to SQL in the same session, so scores and traversals stay next to the dimensions they explain
Alice :Customer txn_1042 :Transaction · $430 Northwind :Merchant · PageRank 0.83 · degree 19
Claude Cursor Copilot Your Agent
Catalog and automation access

Typed catalog access
for code assistants and agents

Code assistants and automation need typed access to the real catalog, not guesses from a prompt. DeltaForge ships with a built-in MCP server, the open Model Context Protocol used by Claude, Cursor, and GitHub Copilot Chat. Plug it in once, and the caller gets typed actions for catalog, lineage, SQL, and pipelines under the same RBAC and audit logging as a human user.

Real parser feedback raises the quality of AI-generated code, the assistant fixes its own SQL before you see it
Typed actions across catalog, lineage, SQL, pipelines, and docs
Same RBAC as humans, scoped tokens, tool-call audit logging, runs in your network
Built for Production Workloads

Four real-world workloads
one native engine

Run heavy Delta and Iceberg SQL with proof it works. Serve Power BI, Tableau, and Excel directly through a smart-cache ODBC driver. Traverse your tables as a native property graph with Cypher and PageRank. Wire the catalog into Claude, Cursor, and Copilot through MCP. All on your infrastructure, all in one native execution platform.