If you search for how to update or delete rows in an Apache Iceberg table, the official documentation answers with Spark: the Iceberg Spark SQL extensions are where UPDATE, DELETE and MERGE INTO are specified and demonstrated. The most visible alternative is AWS Athena, which documents row-level Iceberg DML on engine version 3, but only as a managed service inside AWS. If your data lives on-prem, in a private cloud, or simply outside that ecosystem, page one of the search results quietly assumes you will stand up a JVM cluster to change a row.
That assumption is about engines, not about the format. Iceberg's spec defines how snapshots, manifests and delete files describe a table; nothing in it requires Spark to write them. What you need is an engine that implements the write path. DeltaForge is a commercial, customer-installed analytics engine written in Rust that does exactly that: you install it on your own hardware or cloud account, point it at your object store, and run DML in PostgreSQL-flavored SQL. No JVM, no cluster to size, no per-query cloud bill.
How DeltaForge writes Iceberg tables
It is worth being precise about the mechanism, because it explains both the capabilities and the limits. A DeltaForge table with Iceberg enabled maintains two metadata chains over one set of Parquet data files. Every committed transaction writes a new version to the table's transaction log, and a post-commit hook immediately generates the corresponding Iceberg snapshot under the table's metadata/ directory. This is the UniForm pattern: one physical table, readable natively by the engine and readable as a standard Iceberg table by anything that speaks Iceberg, from Spark and Trino to PyIceberg and Athena.
The practical consequence is that the full DML surface of the engine, including multi-arm MERGE, becomes available to your Iceberg consumers. An UPDATE on a version 3 table produces deletion vectors and rewritten data files; a MERGE produces deletion vectors for the matched rows and new data files for the inserted ones; every one of these commits appears as a distinct Iceberg snapshot with correct manifest lists. Time travel works through either metadata chain because both describe the same files.
A complete MERGE INTO example in SQL
The walkthrough below is the iceberg-v3-merge-upsert demo from the DeltaForge demo library, lightly trimmed. It models a supply chain inventory table synced from a supplier feed: the feed contains a mix of restocks for existing SKUs and entirely new products, which is the canonical upsert shape. The storage root used throughout is s3://acme-lake/demo; substitute your own bucket or local path.
-
Create a zone over your storage root, a schema, and the table. The three table properties are what turn on Iceberg metadata generation: enable the format, pick the Iceberg spec version (3 here, which brings deletion vectors), and use ID-based column mapping so renames stay safe across both formats.
CREATE ZONE IF NOT EXISTS external TYPE EXTERNAL STORAGE_ROOT = 's3://acme-lake/demo' COMMENT 'External and Delta tables for the Iceberg demos'; CREATE SCHEMA IF NOT EXISTS external.iceberg_demos COMMENT 'Iceberg UniForm demo tables'; CREATE DELTA TABLE IF NOT EXISTS external.iceberg_demos.supply_chain_inventory ( item_id INT, warehouse VARCHAR, sku VARCHAR, category VARCHAR, product_name VARCHAR, quantity INT, unit_price DOUBLE, last_received VARCHAR ) LOCATION 'iceberg-v3-merge-upsert/supply_chain_inventory' TBLPROPERTIES ( 'delta.universalFormat.enabledFormats' = 'iceberg', 'delta.universalFormat.icebergVersion' = '3', 'delta.columnMapping.mode' = 'id' );The table's LOCATIONis relative to the zone's storage root, so the data lands unders3://acme-lake/demo/iceberg-v3-merge-upsert/supply_chain_inventory. -
Seed the warehouse with ten SKUs. This is commit number one, and it already produces the first Iceberg snapshot.
INSERT INTO external.iceberg_demos.supply_chain_inventory VALUES (1, 'WH-EAST', 'SKU-1001', 'Electronics', 'Wireless Mouse', 150, 24.99, '2024-01-05'), (2, 'WH-EAST', 'SKU-1002', 'Electronics', 'USB-C Hub', 80, 39.99, '2024-01-05'), (3, 'WH-EAST', 'SKU-1003', 'Office', 'Ergonomic Chair', 25, 299.99, '2024-01-10'), (4, 'WH-EAST', 'SKU-1004', 'Office', 'Standing Desk', 15, 549.99, '2024-01-10'), (5, 'WH-EAST', 'SKU-1005', 'Accessories', 'Monitor Stand', 60, 79.99, '2024-01-12'), (6, 'WH-EAST', 'SKU-1006', 'Accessories', 'Laptop Sleeve 15in', 200, 29.99, '2024-01-12'), (7, 'WH-EAST', 'SKU-1007', 'Electronics', 'Mechanical Keyboard', 90, 89.99, '2024-01-15'), (8, 'WH-EAST', 'SKU-1008', 'Office', 'Whiteboard 4x6', 12, 149.99, '2024-01-15'), (9, 'WH-EAST', 'SKU-1009', 'Accessories', 'Webcam HD', 110, 59.99, '2024-01-18'), (10, 'WH-EAST', 'SKU-1010', 'Electronics', 'Docking Station', 45, 179.99, '2024-01-18');Ten rows in warehouse WH-EAST: the baseline state before the supplier feed arrives. -
Now the upsert. The supplier feed carries five items: three match existing keys (restocks for items 1, 3 and 7) and two are new products. One
MERGE INTOhandles both, atomically.MERGE INTO external.iceberg_demos.supply_chain_inventory AS t USING ( SELECT * FROM (VALUES (1, 'WH-EAST', 'SKU-1001', 'Electronics', 'Wireless Mouse', 175, 24.99, '2024-02-01'), (3, 'WH-EAST', 'SKU-1003', 'Office', 'Ergonomic Chair', 30, 299.99, '2024-02-01'), (7, 'WH-EAST', 'SKU-1007', 'Electronics', 'Mechanical Keyboard',110, 89.99, '2024-02-01'), (31, 'WH-EAST', 'SKU-2001', 'Electronics', 'Wireless Earbuds', 200, 49.99, '2024-02-01'), (32, 'WH-EAST', 'SKU-2002', 'Office', 'Desk Lamp LED', 85, 44.99, '2024-02-01') ) AS s(item_id, warehouse, sku, category, product_name, quantity, unit_price, last_received) ) AS s ON t.item_id = s.item_id WHEN MATCHED THEN UPDATE SET quantity = s.quantity, last_received = s.last_received WHEN NOT MATCHED THEN INSERT VALUES ( s.item_id, s.warehouse, s.sku, s.category, s.product_name, s.quantity, s.unit_price, s.last_received );The source here is an inline VALUESlist for reproducibility; in production it is typically a staging table or view over the incoming feed. -
Verify all three effects: the row count grew from 10 to 12, the matched rows carry the new quantities, and the two new products exist.
SELECT COUNT(*) AS total_items FROM external.iceberg_demos.supply_chain_inventory; SELECT item_id, sku, product_name, quantity, last_received FROM external.iceberg_demos.supply_chain_inventory WHERE item_id IN (1, 3, 7) ORDER BY item_id; SELECT * FROM external.iceberg_demos.supply_chain_inventory WHERE item_id IN (31, 32) ORDER BY item_id;Expected: total_items = 12; quantities 175, 30 and 110 for items 1, 3 and 7; and two rows for the Wireless Earbuds and Desk Lamp LED inserts. -
Verify with time travel. Version 1 is the seed commit, so reading it next to the current state proves the merge changed exactly what it claimed and nothing else.
SELECT (SELECT COUNT(*) FROM external.iceberg_demos.supply_chain_inventory VERSION AS OF 1) AS rows_before_merge, (SELECT COUNT(*) FROM external.iceberg_demos.supply_chain_inventory) AS rows_after_merge;Expected: rows_before_merge = 10,rows_after_merge = 12. The pre-merge snapshot remains fully queryable. -
Finally, the independent proof: register the same physical location as an external Iceberg table and read it back through the Iceberg metadata chain alone. This is the same path an outside engine would take.
CREATE EXTERNAL TABLE IF NOT EXISTS external.iceberg_demos.supply_chain_inventory_iceberg USING ICEBERG LOCATION 'iceberg-v3-merge-upsert/supply_chain_inventory'; SELECT COUNT(*) AS total_items FROM external.iceberg_demos.supply_chain_inventory_iceberg; SELECT * FROM external.iceberg_demos.supply_chain_inventory_iceberg WHERE item_id = 1;Expected: 12 rows through the Iceberg reader, and item 1 shows quantity 175, the post-merge value. The deletion vectors written by the merge resolve correctly through Iceberg V3 metadata.
The full demo goes further than this trimmed version: it seeds 30 SKUs across three warehouses, runs a second merge round (price adjustments plus another insert), and asserts exact per-warehouse aggregates through both the native and the Iceberg read paths. The iceberg-v3-merge-upsert demo ships in the DeltaForge demo library, so you can run every statement, asserts included, against your own installation.
How do you run UPDATE and DELETE on Iceberg without Spark?
The same way, with plain statements. The companion demo, iceberg-uniform-crud-lifecycle, walks an employee table through the full mutation lifecycle and checks that the Iceberg snapshot chain tracks every step. Its setup creates an employees table with the same three table properties shown above (that demo uses Iceberg version 2, which is also supported), seeds 20 rows, and then mutates:
UPDATE external.iceberg_demos.employees
SET salary = ROUND(salary * 1.15, 2)
WHERE department = 'Engineering';
UPDATE external.iceberg_demos.employees
SET is_active = false
WHERE id IN (8, 13);
DELETE FROM external.iceberg_demos.employees
WHERE is_active = false;
INSERT INTO external.iceberg_demos.employees VALUES
(21, 'Uma Foster', 'Sales', 'Sales Rep', 76000.00, true),
(22, 'Victor Reyes', 'Marketing', 'SEO Specialist', 82000.00, true),
(23, 'Wendy Chang', 'Engineering', 'ML Engineer', 145000.00, true);
Because every commit is preserved, you can audit any of them after the fact. A self-join against an earlier version shows exactly what the raise changed, and DESCRIBE HISTORY lists each operation in the log:
SELECT
c.name,
ROUND(old.salary, 2) AS old_salary,
ROUND(c.salary, 2) AS new_salary,
ROUND(c.salary - old.salary, 2) AS salary_increase
FROM external.iceberg_demos.employees c
JOIN external.iceberg_demos.employees VERSION AS OF 1 old
ON c.id = old.id
WHERE c.department = 'Engineering'
ORDER BY c.id;
DESCRIBE HISTORY external.iceberg_demos.employees;
Which engines support full DML on Apache Iceberg?
A fair map of the documented landscape, stated neutrally:
- Apache Spark
- The reference engine. Iceberg's own documentation specifies
UPDATE,DELETEandMERGE INTOthrough the Spark SQL extensions. Requires a JVM and, for production workloads, a cluster to provision and tune. - Trino
- The Iceberg connector documents row-level
UPDATE,DELETEandMERGE. Trino is a distributed JVM system, typically deployed as a coordinator plus workers. - AWS Athena
- Engine version 3 documents
INSERT,UPDATE,DELETEandMERGEon Iceberg tables. Fully managed, and only available as an AWS service against AWS-resident data. - Apache Flink
- The Iceberg Flink connector centers on streaming writes, including upsert-mode streams; it is a JVM stream processor rather than an interactive SQL prompt.
- PyIceberg
- A Python library with a programmatic write API. Powerful for pipelines, but it is code you write and operate, not a SQL statement you hand to an engine.
- DeltaForge
- Customer-installed, no JVM, single binary. Full
INSERT,UPDATE,DELETEand multi-armMERGEin PostgreSQL-flavored SQL, with Iceberg V2 or V3 snapshots generated on every commit for external readers.
None of this is a criticism of the JVM engines; they are mature and widely deployed. The point is narrower: if your requirement is "change rows in an Iceberg table from SQL, on infrastructure I control, without operating a cluster," the documented options thin out fast, and that is the slot DeltaForge fills.
Can you MERGE into an Iceberg table outside AWS Athena?
Yes. MERGE is an engine capability, not a cloud service capability, and nothing about the Iceberg format ties it to AWS. The worked example above runs against any storage DeltaForge can reach: S3 and S3-compatible object stores (including on-prem MinIO-style deployments), Azure storage, GCS, or a plain local filesystem. The merge commits as one atomic snapshot; an Athena, Trino or Spark reader pointed at the same table location sees the upserted state through standard Iceberg metadata, with no involvement from the engine that wrote it.
How do you upsert into Iceberg with SQL on-prem?
Concretely: install DeltaForge on your own hardware (it is commercial software you deploy yourself, so there is no data egress and no managed-service dependency), create a zone whose STORAGE_ROOT points at your on-prem object store or filesystem, and run the exact sequence from the walkthrough. The only thing that changes between laptop, data center and cloud is the storage root string. For recurring feeds, the same MERGE INTO belongs in a scheduled pipeline with a staging table as the source instead of the inline VALUES block; the statement itself is unchanged, and idempotent by construction since re-merging the same feed matches the same keys.
What happens at the storage layer
For readers who want to trust but verify, here is what each operation physically writes under the table location:
- INSERT
- New Parquet data files, one new version in the transaction log, and one new Iceberg snapshot whose manifests add those files.
- UPDATE and DELETE
- On a version 3 table, deletion vectors mark the superseded rows and rewritten files carry the surviving or modified rows. The Iceberg snapshot references both, so readers reconstruct the exact post-DML state.
- MERGE
- One atomic commit combining both effects: deletion vectors plus rewrites for the matched arm, fresh data files for the not-matched arm. There is no intermediate state in which an external reader can observe the update without the insert.
- Time travel
- Earlier snapshots stay addressable through either metadata chain until you vacuum them away, which is what makes the verification queries above possible.
Scope and limits
Honesty about the mechanism matters more than the pitch. DeltaForge's Iceberg DML is delivered through dual-format tables: the engine commits through its own transaction log and generates the Iceberg metadata on each commit. That means the tables you mutate this way are tables DeltaForge owns the write path for. Iceberg tables created and written by other engines are registered with CREATE EXTERNAL TABLE ... USING ICEBERG and queried read-only; you bring data under DeltaForge's write path by loading it into a managed table rather than by adopting a foreign engine's table in place. One practical note from the demo library: some external Iceberg tools resolve Windows-style backslash paths poorly, so on Windows installations prefer forward-slash or UNC paths for the data root.
When you are done experimenting, drop the demo objects together with their files:
DROP EXTERNAL TABLE IF EXISTS external.iceberg_demos.supply_chain_inventory_iceberg WITH FILES;
DROP DELTA TABLE IF EXISTS external.iceberg_demos.supply_chain_inventory WITH FILES;
WITH FILES removes the data files, the transaction log and the Iceberg metadata/ directory, not just the catalog entry.If your tables are Delta rather than Iceberg, the same DML story applies there too; the companion article on Delta Lake MERGE, UPDATE and DELETE without Spark covers it with the same demo-backed approach.