DeltaForge is a single binary that runs native SQL on Delta Lake and Apache Iceberg. No JVM, no Spark, no cluster. Connect with standard ODBC, JDBC, Python, R, .NET, or wire it into Claude and Cursor through the built-in MCP server.
Three building blocks that ship in one binary
Native SQL on Delta Lake and Apache Iceberg. PostgreSQL-flavored grammar plus lakehouse commands: MERGE, time travel, deletion vectors, change data feed, UniForm interop, PIPELINE, VACUUM, OPTIMIZE.
Project your Delta tables into a property graph in place. Cypher plus 32 algorithms (PageRank, Leiden, Bellman-Ford, FastRP embeddings, K-core, Yen's K-SP...) with 18 of them GPU-accelerated. Join graph results back to SQL in the same session.
Claude, Cursor, and Copilot get live access to your catalog, schemas, and SQL surface through the built-in Model Context Protocol server. No bespoke retrieval layer to maintain.
No registration to download. No JVM to install. No cluster to provision. The Community license activates from a free account when you first launch.
# Headless engine + CLI for scripts and CI curl -fsSL https://deltaforge.org/install.sh | sh -s -- --pkg deltaforge-cli
CREATE TABLE sales USING DELTA AS SELECT id, amount, ts FROM read_parquet('s3://bucket/raw/*.parquet'); MERGE INTO sales t USING updates u ON t.id = u.id WHEN MATCHED THEN UPDATE SET amount = u.amount WHEN NOT MATCHED THEN INSERT *; SELECT SUM(amount) FROM sales VERSION AS OF 3;
A first-class SQL command parsed by the engine. Schedule, reliability settings, and metadata live in the same file as the SQL logic.
PIPELINE sales_daily_refresh
SCHEDULE '0 6 * * *'
TIMEZONE 'America/New_York'
OWNER 'data-team'
TIMEOUT '30m'
RETRIES 3;
INSERT INTO gold.revenue
SELECT product_id, SUM(amount) AS revenue
FROM curated.sales
WHERE sale_date >= CURRENT_DATE - INTERVAL '1 day'
GROUP BY product_id;
DeltaForge reads the SQL and derives which tables each pipeline reads from and writes to. You do not declare lineage. Execution order across pipelines is calculated from those dependencies automatically.
Pull requests show SQL diffs. Reviewers see exactly what changed: a schedule, a filter condition, a column added to a SELECT. No opaque JSON config blobs to decode.
Write, execute, and monitor pipelines without leaving your editor.
Autocomplete for PIPELINE directives and schema-aware IntelliSense for table and column references drawn from the live catalog. No context-switching to look up column names or cron syntax.
Execute queries against a running compute node. Results in a paginated grid with full type fidelity. Auto-discovers healthy nodes, fails over if one is unreachable.
Connection health, data catalog hierarchy, pipeline status grouped by state, SQL and pipeline snippet templates, and query history with execution time and row counts.
Configuration and management hub. Set up workspaces, connect git repositories, and explore the catalog.
Connect a remote git repository as a DeltaForge workspace. Credentials stored in the OS keychain. One-click "Open in VS Code" writes the connection config and launches your editor.
Browse schemas, tables, and columns in a hierarchical tree. View column types, nullability, and partition info. Table version history with time travel. Quick actions: preview data, copy name, show DDL.
Register and monitor compute nodes. Configure data source connections with credentials stored in OS Keychain or Azure Key Vault, distributed to compute nodes at execution time.
Ad-hoc queries, script execution, and CI/CD automation from the terminal.
# Interactive REPL
delta-forge-cli --profile production
# Execute a script with variable substitution
delta-forge-cli run migrate.sql -D env=prod -D cutoff=2024-01-15
# One-shot query, JSON output
delta-forge-cli --format json query "SELECT count(*) FROM sales.orders"
# CI/CD: pipe from stdin
cat setup.sql | delta-forge-cli --force
Four shapes of application that fit the stack
Declarative PIPELINE definitions, idempotent MERGE upserts, change data feed for incremental ETL, and time-travel reads for backfills. Run them locally, schedule them in prod, audit them with SQL.
Point Power BI, Tableau, or Excel at the lake directly through ODBC. Embed query results into product surfaces via the same driver. No copy step into a second warehouse.
Expose tables, views, and the SQL surface to Claude or Cursor through the MCP server. Build retrieval, summarization, and write-back agents against the same engine your dashboards use.
Fraud rings, recommendations, supply-chain traversal, and lineage analysis on the tables you already have. Cypher and PageRank on Delta, joined back to SQL in the same session.
Reference, examples, and the code itself
SQL reference, configuration, deployment, MCP setup, and connector guides at docs.deltaforge.org.
Apache 2.0 source. Track releases, file issues, read PRs at deltaforge-org/delta-forge.
258 end-to-end SQL demos with 10,500+ machine-checked assertions. Run them locally to learn the grammar by example. Browse demos.
Public harness for TPC-H, TPC-DS, SSB, JOB plus 7,137 bi-directional Spark scenarios. Reproduce, audit, or extend on your own hardware.
Free Community license. Single binary. No JVM, no cluster, no card required.