MERGE, UPDATE and DELETE on Delta Lake Without Spark

Q: Can you run MERGE on a Delta table without Spark?

Yes. The Delta Lake transaction protocol is an open specification: any engine that can write Parquet data files and commit add and remove actions to the _delta_log directory can perform MERGE. DeltaForge, a commercial engine you install on your own infrastructure, implements the full statement in SQL, including WHEN MATCHED, WHEN NOT MATCHED, and WHEN NOT MATCHED BY SOURCE clauses, so a single atomic statement can update, insert, and delete in one commit. No Spark cluster, JVM, or Python scripting is involved.

Q: How do I update or delete rows in Delta Lake without Databricks?

Use a SQL engine that implements the Delta write protocol. In DeltaForge you write standard UPDATE and DELETE statements against a Delta table. The engine rewrites only the Parquet data files that contain matching rows, then atomically commits remove actions for the old files and add actions for the new ones. Each statement produces a new table version, so you can compare before and after states with VERSION AS OF time travel, and a predicate that matches zero rows is a harmless no-op.

Q: Does DuckDB support UPDATE and MERGE on Delta Lake?

Not at the time of writing. DuckDB's delta extension is built for reading: its documentation lists writing, updating, and merging Delta tables among the features that are not yet supported. If you query Delta from DuckDB today you get scans, not mutations. To change rows you need an engine that writes the Delta transaction log, such as Spark, Trino with a metastore, the delta-rs Python API, or DeltaForge, which exposes UPDATE, DELETE, and MERGE as plain SQL.

Q: What SQL engines can write to Delta Lake?

Apache Spark with the delta-spark package is the reference implementation, and Databricks ships it as a managed platform. Trino's Delta Lake connector supports INSERT, UPDATE, DELETE, and MERGE, but requires a running Trino cluster plus a Hive metastore or AWS Glue catalog. The delta-rs library offers merge, update, and delete through a Python or Rust API rather than SQL. DeltaForge implements all four DML statements natively in PostgreSQL-flavored SQL as customer-installed software, with no JVM or metastore.

Can you run MERGE on a Delta table without Spark?

Yes, and the reason is structural rather than a vendor trick. Delta Lake is an open table format: a directory of ordinary Parquet data files plus a _delta_log directory of JSON commits that record which files belong to each table version. Any engine that can write Parquet and append correct commit entries to that log can mutate the table. Spark happens to be the reference implementation, not a requirement of the protocol.

The practical problem is that the lightweight tools most people reach for first stop short of writes. DuckDB's delta extension documents reading Delta tables but lists writing, updating, and merging among the features not yet supported. The delta-rs project implements merge, update, and delete, but exposes them through a Python (or Rust) API, so the answer to a SQL question becomes a script. Trino's Delta Lake connector does support full DML, but it asks you to stand up a JVM cluster and a Hive metastore or Glue catalog before your first UPDATE runs.

DeltaForge takes a different position: it is a commercial analytics engine you install on your own infrastructure, and it implements UPDATE, DELETE, INSERT, and the complete MERGE INTO statement natively in PostgreSQL-flavored SQL. The rest of this article walks through what those statements actually do at the storage layer, then runs them, using SQL copied from demos that ship in the DeltaForge demo library and assert their own results.

What happens at the storage layer when you mutate a Delta table

Parquet files are immutable, so Delta never edits a row in place. Every DML statement is a copy-on-write transaction against the log:

The engine scans the table's current snapshot and identifies which data files contain rows matching the predicate. Files with no matching rows are never touched.
It writes new Parquet files: the affected files, re-emitted with updated values (for UPDATE) or with the matching rows omitted (for DELETE).
It commits a single JSON entry to _delta_log/ containing remove actions for the old files and add actions for the new ones. The commit either lands in full or not at all, which is what makes a multi-row mutation atomic.

Two useful consequences fall out of this design. First, every statement creates a new table version, and the old versions remain queryable with VERSION AS OF until you vacuum them, so you can diff a table against its own past to verify a change. Second, a predicate that matches nothing is a true no-op: no files are rewritten and no commit is created, so retrying idempotent maintenance SQL is safe.

How do I update or delete rows in Delta Lake without Databricks?

You write the statement you already know. This is the 10 percent price increase from the delta-basics-crud demo, applied to a Delta products table:

UPDATE external.delta_demos.crud_products
SET price = ROUND(price * 1.10, 2)
WHERE category = 'Electronics';

Copy-on-write UPDATE: only files containing Electronics rows are rewritten; everything else carries forward by reference.

Because the previous version survives in the log, the same demo verifies the update by joining the table against itself one version back:

WITH old AS (
    SELECT id, price AS old_price
    FROM external.delta_demos.crud_products VERSION AS OF 1
)
SELECT p.id, p.name, old.old_price, p.price AS new_price
FROM external.delta_demos.crud_products p
JOIN old ON p.id = old.id
WHERE p.category = 'Electronics'
ORDER BY p.id;

Time travel as a built-in audit: current prices on the right, pre-update prices on the left, no second copy of the data needed.

The output shape is one row per Electronics product with both prices side by side: a laptop that was 999.99 now reads 1099.99, a 29.99 mouse reads 32.99, and every non-Electronics row is untouched. The demo asserts exactly this, including that zero non-Electronics prices drifted.

Updates are not limited to flat assignments. The delta-dml-patterns demo applies a tiered discount in one statement with a CASE expression:

UPDATE external.delta_demos.order_history
SET price = CASE
    WHEN price > 500  THEN ROUND(price * 0.85, 2)
    WHEN price >= 200 THEN ROUND(price * 0.90, 2)
    ELSE                   ROUND(price * 0.95, 2)
END
WHERE product IN ('Laptop', 'Monitor', 'Tablet', 'Headphones', 'Smartwatch');

One atomic commit covers all three discount tiers; the demo asserts 849.99, 314.99, and 189.99 for one product in each tier.

Deletes follow the same shape, and the same demo pairs one with the archive-first pattern you would use in production: copy the doomed rows to an archive table, then purge them, each as its own atomic commit:

INSERT INTO external.delta_demos.order_archive
SELECT * FROM external.delta_demos.order_history
WHERE status = 'cancelled' AND order_date < '2024-06-01';

DELETE FROM external.delta_demos.order_history
WHERE status = 'cancelled' AND order_date < '2024-06-01';

Archive then purge. The demo seeds 60 orders, archives 8 cancelled ones, deletes the same 8, and asserts the table lands at 52 rows.

A complete upsert: MERGE with WHEN NOT MATCHED BY SOURCE

Single-statement upserts are where most non-Spark options run out. A change data capture feed has three cases to reconcile against a target table: rows that changed, rows that are new, and target rows the source no longer mentions. The delta-merge-upsert demo handles all three in one atomic MERGE, and it ships in the DeltaForge demo library so you can run the full flow yourself. Here is that flow end to end.

Create a zone anchored at your object store root, a schema, and the target and source Delta tables. Table LOCATION paths resolve under the zone's storage root, so these tables land beneath s3://acme-lake/demo:

CREATE ZONE IF NOT EXISTS external TYPE EXTERNAL
    STORAGE_ROOT = 's3://acme-lake/demo'
    COMMENT 'Delta demo datasets';

CREATE SCHEMA IF NOT EXISTS external.delta_demos
    COMMENT 'Delta DML tutorial demos';

CREATE DELTA TABLE IF NOT EXISTS external.delta_demos.upsert_products (
    id          INT,
    sku         VARCHAR,
    name        VARCHAR,
    category    VARCHAR,
    price       DOUBLE,
    in_stock    INT
) LOCATION 'delta-merge-upsert/upsert_products';

CREATE DELTA TABLE IF NOT EXISTS external.delta_demos.product_feed (
    id          INT,
    sku         VARCHAR,
    name        VARCHAR,
    category    VARCHAR,
    price       DOUBLE,
    in_stock    INT
) LOCATION 'delta-merge-upsert/product_feed';

Seed the data. The demo's setup script inserts 15 products into the target catalog and 12 rows into the daily supplier feed: 8 of the feed rows are price and stock updates for existing products, 4 are brand new items, and 7 catalog products (ids 9 to 15) do not appear in the feed at all. A plain multi-row insert does the seeding; these three target rows matter most for what follows:
```
INSERT INTO external.delta_demos.upsert_products VALUES
    (1,  'SKU-001', 'Wireless Mouse',   'electronics', 29.99, 150),
    (13, 'SKU-013', 'VGA Adapter',      'electronics', 12.99, 5),
    (15, 'SKU-015', 'Floppy Drive USB', 'electronics', 24.99, 1);
```

Run the three-way merge. Matched rows take the feed's price and stock, unmatched feed rows are inserted, and target rows missing from the feed are deleted only if their stock has dwindled to 5 units or fewer:

MERGE INTO external.delta_demos.upsert_products AS target
USING external.delta_demos.product_feed AS source
ON target.id = source.id
WHEN MATCHED THEN
    UPDATE SET
        price    = source.price,
        in_stock = source.in_stock
WHEN NOT MATCHED THEN
    INSERT (id, sku, name, category, price, in_stock)
    VALUES (source.id, source.sku, source.name, source.category,
            source.price, source.in_stock)
WHEN NOT MATCHED BY SOURCE AND target.in_stock <= 5 THEN
    DELETE;

Verify the arithmetic. Starting from 15 products, the merge updates 8, inserts 4, and deletes the 3 discontinued low-stock items (the VGA adapter, a parallel port cable, and a USB floppy drive), so the catalog must land on exactly 16 rows. The demo encodes that expectation directly in SQL with DeltaForge's ASSERT extension, which fails the script if the count is wrong:
```
ASSERT ROW_COUNT = 16
SELECT id, sku, name, category, price, in_stock
FROM external.delta_demos.upsert_products
ORDER BY id;
```

The interesting clause is WHEN NOT MATCHED BY SOURCE. Without it, removing stale target rows means a second, separate DELETE statement and a window between the two commits where the table is half-reconciled. With it, the deletion participates in the same atomic commit as the updates and inserts. The conditional predicate (AND target.in_stock <= 5) is what keeps the clause surgical: products 9 through 12 are also absent from the feed, but they hold 25 to 300 units of stock, so the merge leaves them alone. The demo asserts all four survive.

One commit, three operations, and the resulting table is still a perfectly ordinary Delta table: the next engine to read it, whatever it is, sees version N+1 with 16 rows.

Does DuckDB support UPDATE and MERGE on Delta Lake?

At the time of writing, no. DuckDB's delta extension is explicitly a reader: its documentation lists writing, updating, and merging Delta tables among the features that are not yet supported. That is a reasonable scope for the extension, and DuckDB remains an excellent way to scan Delta data, but it means a DuckDB-only stack cannot apply the CDC feed above, correct a bad load, or service a deletion request against a Delta table.

The adjacent answers have their own trade-offs. delta-rs (the deltalake Python package) does implement merge, update, and delete, with a well-documented API, but it is a library: the merge above becomes a Python program with builder calls rather than a SQL statement, which matters if your team, your tooling, or your scheduler speaks SQL. Trino supports the full DML surface through its Delta Lake connector, at the cost of operating a coordinator-and-workers JVM cluster plus a Hive metastore or AWS Glue catalog. All of these are legitimate choices; the gap they leave is precisely the one this article fills: full DML, expressed as SQL, from a single engine you can install without cluster plumbing.

What SQL engines can write to Delta Lake?

As of mid-2026, the realistic options for mutating Delta tables look like this, based on each project's own documentation:

Apache Spark + delta-spark: The reference implementation. Full SQL DML including MERGE, at the cost of running Spark, which is the thing you searched for a way around.
Databricks: Managed Spark with the same Delta DML surface, as a cloud platform subscription.
Trino: INSERT, UPDATE, DELETE, and MERGE via the Delta Lake connector. Requires a JVM cluster and a Hive metastore or Glue catalog for table metadata.
delta-rs (deltalake): Merge, update, and delete through a Python or Rust API. A library, not a SQL engine; you orchestrate the calls in code.
DuckDB: Reads Delta through the delta extension; the extension's documentation lists writes, updates, and merges as not yet supported.
DeltaForge: Full SQL DML, including MERGE with WHEN NOT MATCHED BY SOURCE, in PostgreSQL-flavored SQL. Commercial software installed on your own infrastructure, with no JVM, no metastore, and no external catalog service.

Scope and honest limits

A few boundaries worth knowing before you commit a workload. DeltaForge is commercial, customer-installed software, not an open source library; if your constraint is a zero-license-cost stack, delta-rs plus Python is the closest functional match for the write path. The ASSERT statements shown above are a DeltaForge extension for self-verifying scripts, so they will not parse elsewhere; the MERGE, UPDATE, and DELETE statements themselves follow the same shape Spark SQL uses, which keeps migration in either direction mechanical. And because every mutation is copy-on-write, heavy DML churn accumulates superseded files: plan for routine OPTIMIZE and VACUUM maintenance, which is covered in its own guide linked below.

What you do not give up is interoperability. The tables these statements produce are standard Delta Lake: the transaction log on disk is the source of truth, and Spark, DuckDB, Trino, and delta-rs can all read the post-merge versions of the demo tables without knowing what wrote them.

Run it yourself

All three source demos ship in the DeltaForge demo library: delta-basics-crud for the fundamentals and time travel verification, delta-dml-patterns for multi-predicate deletes, CASE-based updates, and the archive-then-purge pattern, and delta-merge-upsert for the full three-way CDC merge. Each one seeds known data, runs the statements shown here, and asserts every intermediate and final row count, so you can confirm the 15 minus 3 plus 4 equals 16 arithmetic on your own object store rather than taking an article's word for it. The engine behind them is the DeltaForge SQL engine.