Skip to content
Guide

Delta Lake without Spark

Delta Lake is an open table format, not a Spark feature. Any engine that writes Parquet and commits to the transaction log can run the full set of operations. This guide collects the ones DeltaForge runs as plain SQL, each with a tutorial you can reproduce.

Why Spark is not required

A Delta Lake table is a directory of ordinary Parquet data files plus a _delta_log directory of JSON commits that record which files belong to each version. The protocol is an open specification. Spark is its reference implementation, but it is not a dependency: any engine that can write Parquet and append correct commits to the log can read, write, and evolve the table.

DeltaForge is a commercial engine you install on your own cloud VMs, on-premises servers, or air-gapped environments. It implements the Delta write protocol directly and exposes it as PostgreSQL-flavored SQL, so the operations below are statements you already know, with no JVM cluster, no notebook, and no managed service reading your tables. Because the output is standard Delta Lake, the tables stay readable by Spark, Databricks, DuckDB, Trino, and delta-rs afterward.

Frequently asked questions

Can you use Delta Lake without Spark?

Yes. Delta Lake is an open table format: a directory of Parquet data files plus a transaction log of JSON commits. Any engine that can write Parquet and append valid commits to the log can operate on the table. Spark is the reference implementation, not a requirement. DeltaForge, a commercial engine you install on your own infrastructure, implements the operations as plain SQL.

What Delta Lake operations can DeltaForge run without Spark?

Inserts, updates, deletes, and full MERGE upserts; change data feed; time travel; SCD Type 2 history; GDPR deletes; and table maintenance with OPTIMIZE, VACUUM, and Z-ORDER. Each is a standard SQL statement, with no JVM cluster and no notebook.

Are the tables still standard Delta Lake afterward?

Yes. DeltaForge writes the standard Delta transaction protocol, so the resulting tables stay readable by Spark, Databricks, DuckDB, Trino, and delta-rs. There is no proprietary on-disk format and no lock-in to the engine.

Does this work with Delta tables created in Databricks?

Yes. Because the format is the open Delta Lake protocol, DeltaForge reads and writes the same tables a Databricks or Spark job produces, and the reverse holds too. You can operate on one table with more than one engine.

Run it on your own data

Install DeltaForge and reproduce every statement above against your own Delta tables.

Get Community License Install DeltaForge