Iceberg INSERT, UPDATE, DELETE and MERGE in SQL, No Spark

Q: How do you run UPDATE and DELETE on Iceberg without Spark?

Use an engine that implements row-level DML natively instead of delegating to Spark. DeltaForge is a customer-installed SQL engine with no JVM dependency: you issue plain UPDATE and DELETE statements against a table whose Iceberg metadata is regenerated on every commit, so each mutation appears as a new Iceberg snapshot. Updated rows are tracked with deletion vectors plus rewritten data files, and any external Iceberg reader sees the post-DML state immediately.

Q: Can you MERGE into an Iceberg table outside AWS Athena?

Yes. Athena's MERGE support only exists inside AWS, but MERGE is an engine feature, not a cloud feature. DeltaForge runs MERGE INTO with WHEN MATCHED THEN UPDATE and WHEN NOT MATCHED THEN INSERT arms on your own hardware or in any cloud, against S3, Azure, GCS or local disk. The merge commits as one atomic snapshot in the table's Iceberg metadata, which Athena, Trino, Spark or PyIceberg can then read.

Q: Which engines support full DML on Apache Iceberg?

Spark with the Iceberg extensions is the reference implementation and documents UPDATE, DELETE and MERGE INTO. Trino's Iceberg connector and AWS Athena engine version 3 also document row-level DML. Flink centers on streaming writes, PyIceberg is a programmatic Python API rather than a SQL prompt, and single-node engines have historically documented Iceberg reading. DeltaForge adds a no-JVM option: full SQL DML on a single installed engine, with Iceberg snapshots produced on every commit.

Q: How do you upsert into Iceberg with SQL on-prem?

Install DeltaForge on your own infrastructure, point a zone at your storage root (local disk, S3-compatible object store, Azure or GCS), create a table with Iceberg metadata generation enabled, and run MERGE INTO with your change feed as the source. The matched arm updates existing keys, the not-matched arm inserts new ones, and the whole upsert lands as one atomic commit visible to both the engine and external Iceberg readers. No Spark cluster, no AWS account, no JVM.

If you search for how to update or delete rows in an Apache Iceberg table, the official documentation answers with Spark: the Iceberg Spark SQL extensions are where UPDATE, DELETE and MERGE INTO are specified and demonstrated. The most visible alternative is AWS Athena, which documents row-level Iceberg DML on engine version 3, but only as a managed service inside AWS. If your data lives on-prem, in a private cloud, or simply outside that ecosystem, page one of the search results quietly assumes you will stand up a JVM cluster to change a row.

That assumption is about engines, not about the format. Iceberg's spec defines how snapshots, manifests and delete files describe a table; nothing in it requires Spark to write them. What you need is an engine that implements the write path. DeltaForge is a commercial, customer-installed analytics engine written in Rust that does exactly that: you install it on your own hardware or cloud account, point it at your object store, and run DML in PostgreSQL-flavored SQL. No JVM, no cluster to size, no per-query cloud bill.

How DeltaForge writes Iceberg tables

It is worth being precise about the mechanism, because it explains both the capabilities and the limits. A DeltaForge table with Iceberg enabled maintains two metadata chains over one set of Parquet data files. Every committed transaction writes a new version to the table's transaction log, and a post-commit hook immediately generates the corresponding Iceberg snapshot under the table's metadata/ directory. This is the UniForm pattern: one physical table, readable natively by the engine and readable as a standard Iceberg table by anything that speaks Iceberg, from Spark and Trino to PyIceberg and Athena.

The practical consequence is that the full DML surface of the engine, including multi-arm MERGE, becomes available to your Iceberg consumers. An UPDATE on a version 3 table produces deletion vectors and rewritten data files; a MERGE produces deletion vectors for the matched rows and new data files for the inserted ones; every one of these commits appears as a distinct Iceberg snapshot with correct manifest lists. Time travel works through either metadata chain because both describe the same files.

A complete MERGE INTO example in SQL

The walkthrough below is the iceberg-v3-merge-upsert demo from the DeltaForge demo library, lightly trimmed. It models a supply chain inventory table synced from a supplier feed: the feed contains a mix of restocks for existing SKUs and entirely new products, which is the canonical upsert shape. The storage root used throughout is s3://acme-lake/demo; substitute your own bucket or local path.

Create a zone over your storage root, a schema, and the table. The three table properties are what turn on Iceberg metadata generation: enable the format, pick the Iceberg spec version (3 here, which brings deletion vectors), and use ID-based column mapping so renames stay safe across both formats.

CREATE ZONE IF NOT EXISTS external TYPE EXTERNAL
    STORAGE_ROOT = 's3://acme-lake/demo'
    COMMENT 'External and Delta tables for the Iceberg demos';

CREATE SCHEMA IF NOT EXISTS external.iceberg_demos
    COMMENT 'Iceberg UniForm demo tables';

CREATE DELTA TABLE IF NOT EXISTS external.iceberg_demos.supply_chain_inventory (
    item_id        INT,
    warehouse      VARCHAR,
    sku            VARCHAR,
    category       VARCHAR,
    product_name   VARCHAR,
    quantity       INT,
    unit_price     DOUBLE,
    last_received  VARCHAR
) LOCATION 'iceberg-v3-merge-upsert/supply_chain_inventory'
TBLPROPERTIES (
    'delta.universalFormat.enabledFormats' = 'iceberg',
    'delta.universalFormat.icebergVersion' = '3',
    'delta.columnMapping.mode' = 'id'
);

The table's LOCATION is relative to the zone's storage root, so the data lands under s3://acme-lake/demo/iceberg-v3-merge-upsert/supply_chain_inventory.

Seed the warehouse with ten SKUs. This is commit number one, and it already produces the first Iceberg snapshot.

INSERT INTO external.iceberg_demos.supply_chain_inventory VALUES
    (1,  'WH-EAST', 'SKU-1001', 'Electronics', 'Wireless Mouse',      150, 24.99,  '2024-01-05'),
    (2,  'WH-EAST', 'SKU-1002', 'Electronics', 'USB-C Hub',           80,  39.99,  '2024-01-05'),
    (3,  'WH-EAST', 'SKU-1003', 'Office',      'Ergonomic Chair',     25,  299.99, '2024-01-10'),
    (4,  'WH-EAST', 'SKU-1004', 'Office',      'Standing Desk',       15,  549.99, '2024-01-10'),
    (5,  'WH-EAST', 'SKU-1005', 'Accessories', 'Monitor Stand',       60,  79.99,  '2024-01-12'),
    (6,  'WH-EAST', 'SKU-1006', 'Accessories', 'Laptop Sleeve 15in',  200, 29.99,  '2024-01-12'),
    (7,  'WH-EAST', 'SKU-1007', 'Electronics', 'Mechanical Keyboard', 90,  89.99,  '2024-01-15'),
    (8,  'WH-EAST', 'SKU-1008', 'Office',      'Whiteboard 4x6',      12,  149.99, '2024-01-15'),
    (9,  'WH-EAST', 'SKU-1009', 'Accessories', 'Webcam HD',           110, 59.99,  '2024-01-18'),
    (10, 'WH-EAST', 'SKU-1010', 'Electronics', 'Docking Station',     45,  179.99, '2024-01-18');

Ten rows in warehouse WH-EAST: the baseline state before the supplier feed arrives.

Now the upsert. The supplier feed carries five items: three match existing keys (restocks for items 1, 3 and 7) and two are new products. One MERGE INTO handles both, atomically.

MERGE INTO external.iceberg_demos.supply_chain_inventory AS t
USING (
    SELECT * FROM (VALUES
        (1,  'WH-EAST', 'SKU-1001', 'Electronics', 'Wireless Mouse',     175, 24.99,  '2024-02-01'),
        (3,  'WH-EAST', 'SKU-1003', 'Office',      'Ergonomic Chair',    30,  299.99, '2024-02-01'),
        (7,  'WH-EAST', 'SKU-1007', 'Electronics', 'Mechanical Keyboard',110, 89.99,  '2024-02-01'),
        (31, 'WH-EAST', 'SKU-2001', 'Electronics', 'Wireless Earbuds',   200, 49.99,  '2024-02-01'),
        (32, 'WH-EAST', 'SKU-2002', 'Office',      'Desk Lamp LED',      85,  44.99,  '2024-02-01')
    ) AS s(item_id, warehouse, sku, category, product_name, quantity, unit_price, last_received)
) AS s
ON t.item_id = s.item_id
WHEN MATCHED THEN UPDATE SET
    quantity = s.quantity,
    last_received = s.last_received
WHEN NOT MATCHED THEN INSERT VALUES (
    s.item_id, s.warehouse, s.sku, s.category, s.product_name,
    s.quantity, s.unit_price, s.last_received
);

The source here is an inline VALUES list for reproducibility; in production it is typically a staging table or view over the incoming feed.

Verify all three effects: the row count grew from 10 to 12, the matched rows carry the new quantities, and the two new products exist.

SELECT COUNT(*) AS total_items
FROM external.iceberg_demos.supply_chain_inventory;

SELECT item_id, sku, product_name, quantity, last_received
FROM external.iceberg_demos.supply_chain_inventory
WHERE item_id IN (1, 3, 7)
ORDER BY item_id;

SELECT *
FROM external.iceberg_demos.supply_chain_inventory
WHERE item_id IN (31, 32)
ORDER BY item_id;

Expected: total_items = 12; quantities 175, 30 and 110 for items 1, 3 and 7; and two rows for the Wireless Earbuds and Desk Lamp LED inserts.

Verify with time travel. Version 1 is the seed commit, so reading it next to the current state proves the merge changed exactly what it claimed and nothing else.
```
SELECT
    (SELECT COUNT(*) FROM external.iceberg_demos.supply_chain_inventory VERSION AS OF 1) AS rows_before_merge,
    (SELECT COUNT(*) FROM external.iceberg_demos.supply_chain_inventory) AS rows_after_merge;
```
Expected: rows_before_merge = 10, rows_after_merge = 12. The pre-merge snapshot remains fully queryable.
Finally, the independent proof: register the same physical location as an external Iceberg table and read it back through the Iceberg metadata chain alone. This is the same path an outside engine would take.
```
CREATE EXTERNAL TABLE IF NOT EXISTS external.iceberg_demos.supply_chain_inventory_iceberg
USING ICEBERG
LOCATION 'iceberg-v3-merge-upsert/supply_chain_inventory';

SELECT COUNT(*) AS total_items
FROM external.iceberg_demos.supply_chain_inventory_iceberg;

SELECT *
FROM external.iceberg_demos.supply_chain_inventory_iceberg
WHERE item_id = 1;
```
Expected: 12 rows through the Iceberg reader, and item 1 shows quantity 175, the post-merge value. The deletion vectors written by the merge resolve correctly through Iceberg V3 metadata.

The full demo goes further than this trimmed version: it seeds 30 SKUs across three warehouses, runs a second merge round (price adjustments plus another insert), and asserts exact per-warehouse aggregates through both the native and the Iceberg read paths. The iceberg-v3-merge-upsert demo ships in the DeltaForge demo library, so you can run every statement, asserts included, against your own installation.

How do you run UPDATE and DELETE on Iceberg without Spark?

The same way, with plain statements. The companion demo, iceberg-uniform-crud-lifecycle, walks an employee table through the full mutation lifecycle and checks that the Iceberg snapshot chain tracks every step. Its setup creates an employees table with the same three table properties shown above (that demo uses Iceberg version 2, which is also supported), seeds 20 rows, and then mutates:

UPDATE external.iceberg_demos.employees
SET salary = ROUND(salary * 1.15, 2)
WHERE department = 'Engineering';

UPDATE external.iceberg_demos.employees
SET is_active = false
WHERE id IN (8, 13);

DELETE FROM external.iceberg_demos.employees
WHERE is_active = false;

INSERT INTO external.iceberg_demos.employees VALUES
    (21, 'Uma Foster',    'Sales',       'Sales Rep',        76000.00,  true),
    (22, 'Victor Reyes',  'Marketing',   'SEO Specialist',   82000.00,  true),
    (23, 'Wendy Chang',   'Engineering', 'ML Engineer',      145000.00, true);

Four commits, four new Iceberg snapshots: a predicate UPDATE, a flagging UPDATE, a DELETE, and an INSERT.

Because every commit is preserved, you can audit any of them after the fact. A self-join against an earlier version shows exactly what the raise changed, and DESCRIBE HISTORY lists each operation in the log:

SELECT
    c.name,
    ROUND(old.salary, 2) AS old_salary,
    ROUND(c.salary, 2) AS new_salary,
    ROUND(c.salary - old.salary, 2) AS salary_increase
FROM external.iceberg_demos.employees c
JOIN external.iceberg_demos.employees VERSION AS OF 1 old
    ON c.id = old.id
WHERE c.department = 'Engineering'
ORDER BY c.id;

DESCRIBE HISTORY external.iceberg_demos.employees;

The join reads the current state and version 1 in one query; the demo asserts, for example, that Alice Chen moved from 135000.00 to 155250.00.

Which engines support full DML on Apache Iceberg?

A fair map of the documented landscape, stated neutrally:

Apache Spark: The reference engine. Iceberg's own documentation specifies UPDATE, DELETE and MERGE INTO through the Spark SQL extensions. Requires a JVM and, for production workloads, a cluster to provision and tune.
Trino: The Iceberg connector documents row-level UPDATE, DELETE and MERGE. Trino is a distributed JVM system, typically deployed as a coordinator plus workers.
AWS Athena: Engine version 3 documents INSERT, UPDATE, DELETE and MERGE on Iceberg tables. Fully managed, and only available as an AWS service against AWS-resident data.
Apache Flink: The Iceberg Flink connector centers on streaming writes, including upsert-mode streams; it is a JVM stream processor rather than an interactive SQL prompt.
PyIceberg: A Python library with a programmatic write API. Powerful for pipelines, but it is code you write and operate, not a SQL statement you hand to an engine.
DeltaForge: Customer-installed, no JVM, single binary. Full INSERT, UPDATE, DELETE and multi-arm MERGE in PostgreSQL-flavored SQL, with Iceberg V2 or V3 snapshots generated on every commit for external readers.

None of this is a criticism of the JVM engines; they are mature and widely deployed. The point is narrower: if your requirement is "change rows in an Iceberg table from SQL, on infrastructure I control, without operating a cluster," the documented options thin out fast, and that is the slot DeltaForge fills.

Can you MERGE into an Iceberg table outside AWS Athena?

Yes. MERGE is an engine capability, not a cloud service capability, and nothing about the Iceberg format ties it to AWS. The worked example above runs against any storage DeltaForge can reach: S3 and S3-compatible object stores (including on-prem MinIO-style deployments), Azure storage, GCS, or a plain local filesystem. The merge commits as one atomic snapshot; an Athena, Trino or Spark reader pointed at the same table location sees the upserted state through standard Iceberg metadata, with no involvement from the engine that wrote it.

How do you upsert into Iceberg with SQL on-prem?

Concretely: install DeltaForge on your own hardware (it is commercial software you deploy yourself, so there is no data egress and no managed-service dependency), create a zone whose STORAGE_ROOT points at your on-prem object store or filesystem, and run the exact sequence from the walkthrough. The only thing that changes between laptop, data center and cloud is the storage root string. For recurring feeds, the same MERGE INTO belongs in a scheduled pipeline with a staging table as the source instead of the inline VALUES block; the statement itself is unchanged, and idempotent by construction since re-merging the same feed matches the same keys.

What happens at the storage layer

For readers who want to trust but verify, here is what each operation physically writes under the table location:

INSERT: New Parquet data files, one new version in the transaction log, and one new Iceberg snapshot whose manifests add those files.
UPDATE and DELETE: On a version 3 table, deletion vectors mark the superseded rows and rewritten files carry the surviving or modified rows. The Iceberg snapshot references both, so readers reconstruct the exact post-DML state.
MERGE: One atomic commit combining both effects: deletion vectors plus rewrites for the matched arm, fresh data files for the not-matched arm. There is no intermediate state in which an external reader can observe the update without the insert.
Time travel: Earlier snapshots stay addressable through either metadata chain until you vacuum them away, which is what makes the verification queries above possible.

Scope and limits

Honesty about the mechanism matters more than the pitch. DeltaForge's Iceberg DML is delivered through dual-format tables: the engine commits through its own transaction log and generates the Iceberg metadata on each commit. That means the tables you mutate this way are tables DeltaForge owns the write path for. Iceberg tables created and written by other engines are registered with CREATE EXTERNAL TABLE ... USING ICEBERG and queried read-only; you bring data under DeltaForge's write path by loading it into a managed table rather than by adopting a foreign engine's table in place. One practical note from the demo library: some external Iceberg tools resolve Windows-style backslash paths poorly, so on Windows installations prefer forward-slash or UNC paths for the data root.

When you are done experimenting, drop the demo objects together with their files:

DROP EXTERNAL TABLE IF EXISTS external.iceberg_demos.supply_chain_inventory_iceberg WITH FILES;

DROP DELTA TABLE IF EXISTS external.iceberg_demos.supply_chain_inventory WITH FILES;

WITH FILES removes the data files, the transaction log and the Iceberg metadata/ directory, not just the catalog entry.

If your tables are Delta rather than Iceberg, the same DML story applies there too; the companion article on Delta Lake MERGE, UPDATE and DELETE without Spark covers it with the same demo-backed approach.