Delta Lake Without Spark: SQL on Delta Tables

Watch: ACID CRUD and time travel, verified by Apache Spark

Full Delta Lake reads and writes with no cluster anywhere in the path, then the same tables read back and verified by Apache Spark itself.

Why Spark is not required

A Delta Lake table is a directory of ordinary Parquet data files plus a _delta_log directory of JSON commits that record which files belong to each version. The protocol is an open specification. Spark is its reference implementation, but it is not a dependency: any engine that can write Parquet and append correct commits to the log can read, write, and evolve the table.

DeltaForge is a commercial engine you install on your own cloud VMs, on-premises servers, or air-gapped environments. It implements the Delta write protocol directly and exposes it as PostgreSQL-flavored SQL, so the operations below are statements you already know, with no JVM cluster, no notebook, and no managed service reading your tables. Because the output is standard Delta Lake, the tables stay readable by Spark, Databricks, DuckDB, Trino, and delta-rs afterward.

The operations, as plain SQL

Each links to a runnable, asserted tutorial.

Writing and ingesting

MERGE, UPDATE and DELETE

Copy-on-write internals and a three-way MERGE upsert with WHEN NOT MATCHED BY SOURCE.

Read the tutorial

CSV to Delta Lake

Turn a raw CSV into a managed Delta table in two SQL statements, no ingestion framework.

Read the tutorial

Change tracking and history

Change Data Feed

Read row-level inserts, updates, and deletes between versions with table_changes().

Read the tutorial

SCD Type 2

Maintain slowly changing dimensions with effective dates using a single MERGE.

Read the tutorial

Time travel to a version

Query an earlier table version, and fix the common "cannot time travel to version" error.

Read the tutorial

Compliance and maintenance

GDPR right to be forgotten

Hard-delete a subject's rows and reclaim the underlying files so the data is truly gone.

Read the tutorial

OPTIMIZE, VACUUM and Z-ORDER

Compact small files, reclaim space, and cluster data for faster reads, all in SQL.

Read the tutorial

Beyond Delta

Iceberg without Spark

The same SQL DML on Apache Iceberg tables, with no Spark and no metastore.

Read the tutorial

Delta Lake vs Iceberg

A neutral comparison of the two open table formats and when each one fits.

Read the comparison

Frequently asked questions

Can you use Delta Lake without Spark?

Yes. Delta Lake is an open table format: a directory of Parquet data files plus a transaction log of JSON commits. Any engine that can write Parquet and append valid commits to the log can operate on the table. Spark is the reference implementation, not a requirement. DeltaForge, a commercial engine you install on your own infrastructure, implements the operations as plain SQL.

What Delta Lake operations can DeltaForge run without Spark?

Inserts, updates, deletes, and full MERGE upserts; change data feed; time travel; SCD Type 2 history; GDPR deletes; and table maintenance with OPTIMIZE, VACUUM, and Z-ORDER. Each is a standard SQL statement, with no JVM cluster and no notebook.

Are the tables still standard Delta Lake afterward?

Yes. DeltaForge writes the standard Delta transaction protocol, so the resulting tables stay readable by Spark, Databricks, DuckDB, Trino, and delta-rs. There is no proprietary on-disk format and no lock-in to the engine.

Does this work with Delta tables created in Databricks?

Yes. Because the format is the open Delta Lake protocol, DeltaForge reads and writes the same tables a Databricks or Spark job produces, and the reverse holds too. You can operate on one table with more than one engine.

Run it on your own data

Install DeltaForge and reproduce every statement above against your own Delta tables.

Get Community License Install DeltaForge