Ask "Delta Lake or Iceberg?" in a planning meeting and you will usually get an answer shaped by whoever is in the room. A Databricks shop leans Delta; a team standardising on a vendor-neutral catalog leans Iceberg. Both reflexes are reasonable, and both skip the part that actually informs the decision: what each format is, mechanically, and where they genuinely diverge. This article walks the format internals first, then the shared capabilities, then the real differences, and only at the end touches on the option of not committing to one at all.
One framing to keep in mind throughout: a table format is not a file format. The rows on disk are Parquet in both Delta and Iceberg. What the table format adds is a metadata layer that turns a directory of Parquet files into a single, transactional, versioned table. So most of the comparison is about metadata design, not about how fast a column scans.
What is Delta Lake?
Delta Lake originated in the Apache Spark ecosystem and is defined by its transaction log. Every Delta table has a _delta_log directory next to its data files. Each committed transaction appends a new numbered JSON file (000000.json, 000001.json, and so on) that records the actions in that commit: which Parquet files were added, which were removed, schema or metadata changes, and protocol updates. To read the table, an engine replays the log from the start, applying adds and removes in order to arrive at the current file list. To keep that replay cheap as the log grows, Delta periodically writes a Parquet checkpoint that snapshots the state up to a given version, so a reader starts from the latest checkpoint and replays only the JSON commits after it.
That ordered, append-only log is the heart of the format. Atomicity comes from the commit being a single log file that either exists or does not; concurrency control is optimistic, detecting conflicts at commit time; and time travel falls out naturally, because version N is just the table state after replaying up to commit N. The data files themselves are Parquet, and column-level statistics carried in the log let the engine skip files that cannot match a predicate.
What is Apache Iceberg?
Apache Iceberg originated at Netflix and is defined by a metadata tree rather than a single log. At the top sits a metadata.json file that holds the table schema, the partition spec, and the list of snapshots. The current snapshot points to a manifest list (an Avro file), which points to one or more manifest files, and those manifests finally enumerate the Parquet data files along with per-file statistics and partition values. Reading a table means resolving the current metadata.json, walking the manifest list, then the manifests, then the data files.
The practical effect of that tree is that Iceberg tracks state by snapshot, and a commit produces a new metadata.json (and new manifest structures) that supersede the previous one. Iceberg leans on a catalog to record which metadata.json is current for a table, which is what makes atomic commits work across engines: the commit is the catalog atomically swapping the pointer to the new metadata file. Like Delta, Iceberg stores Parquet data files and carries column bounds for file skipping, and snapshots give it time travel.
What Delta Lake and Iceberg have in common
Before the differences, it is worth being clear about how much overlaps, because it is most of the surface area:
- ACID transactions
- Both give atomic, consistent commits with optimistic concurrency. A reader never sees a half-written commit in either format.
- Time travel
- Both let you query a prior state: by Delta version or timestamp, or by Iceberg snapshot or timestamp. Both keep older versions addressable until a cleanup operation removes them.
- Schema evolution
- Both support adding, renaming, reordering and dropping columns without rewriting existing data, using stable column identifiers (Delta column mapping by ID, Iceberg field IDs) so old files map correctly to the new schema.
- Parquet data files
- Both store rows as Parquet and rely on per-file column statistics (min, max, null counts) for data skipping. The scan performance floor is largely set by Parquet, not by the table format.
- Maintenance operations
- Both accumulate small files and stale metadata over time, and both have maintenance operations to compact files and expire old versions to reclaim storage.
If your workload is "land Parquet, query it transactionally, occasionally evolve the schema, occasionally read history," both formats do that well and you would have a hard time telling them apart from the query side.
Where Delta Lake and Iceberg actually differ
The genuine differences cluster in a few areas. None of them is a knockout; they are trade-offs.
Metadata model: ordered log vs metadata tree
This is the foundational difference. Delta's single ordered log is simple to reason about and naturally serialises commits, but a very long history means more JSON commits to replay between checkpoints, so log depth and checkpoint cadence become things you manage on busy tables. Iceberg's tree spreads metadata across the manifest list and manifests, which can prune large tables efficiently at the manifest level, but it adds more metadata objects and leans on a catalog to track the current pointer. Neither is strictly faster; they optimise for different shapes of growth.
Partitioning: Iceberg hidden partitioning and partition evolution
Iceberg's most distinctive feature is hidden partitioning. You declare a partition transform (for example, partition by month of a timestamp column) and Iceberg derives the partition value itself; queries filter on the natural column and the engine prunes partitions without the writer or reader hardcoding a separate partition column. Iceberg also supports partition evolution: you can change the partition spec of an existing table, and old data keeps its old layout while new data uses the new one, with no rewrite. Delta partitioning is column-based in the classic Hive style (and DeltaForge and others add clustering as an alternative), which is more explicit but less flexible if the partitioning scheme needs to change later.
Row-level deletes: deletion vectors and merge-on-read
How each format handles deletes and updates affects write amplification on high-mutation tables. The two strategies are copy-on-write (rewrite the data files that contain affected rows) and merge-on-read (record which rows are deleted and apply that at read time). Delta Lake implements merge-on-read through deletion vectors: a DELETE or MERGE records the positions of removed rows in a compact bitmap stored beside the data file, so the Parquet is not rewritten, and readers apply the bitmap during the scan. Iceberg supports both copy-on-write and merge-on-read modes, the latter through delete files, and lets you configure the strategy per operation. The upshot is similar (avoid rewriting large files for small changes) but the mechanisms and the knobs differ.
Clustering: Delta Z-ORDER
Delta Lake offers Z-ORDER clustering, which co-locates related values across multiple columns using a space-filling curve so that multi-column filter predicates skip more files. It is applied as part of an OPTIMIZE operation. Iceberg's data layout is driven more by its partitioning and sort-order configuration. Both are aiming at the same goal, fewer files read per query, by different routes.
Catalogs
Iceberg's design assumes a catalog as the authority for "which metadata file is current," and a whole catalog ecosystem (REST catalog, Hive, JDBC, and cloud-managed options) has grown around it, which is part of Iceberg's vendor-neutral appeal across many engines. Delta historically anchors on the log in storage as the source of truth, with the catalog as a pointer and a place for table metadata; the catalog matters but the log is authoritative. This shapes multi-engine stories: Iceberg's catalog-centric model is often cited for cross-engine interoperability, while Delta's log-centric model keeps the table self-describing on storage.
Engines and ecosystem
Delta Lake grew up inside Spark and is deeply integrated with the Spark and Databricks world, with a broad SQL surface there and a growing set of non-Spark readers and writers. Iceberg has cultivated a wide, vendor-neutral engine ecosystem from early on, with first-class support across many query engines and managed services. In practice, the engines you already run, or plan to run, are one of the strongest inputs to the decision, often stronger than any single format feature.
UniForm interoperability
Finally, the formats are not walled off from each other. Delta UniForm lets a Delta table also expose Iceberg metadata: on each Delta commit, the writer generates the corresponding Iceberg metadata.json, manifest list and manifests pointing at the same Parquet files, so one physical table is readable as both. That blurs the "pick one" framing, because a Delta-written table can be consumed by Iceberg readers without a second copy.
Delta Lake vs Iceberg at a glance
A side-by-side of the points above. Treat the "shared" rows as genuinely equivalent and focus your decision on the rows where they diverge.
| Dimension | Delta Lake | Apache Iceberg |
|---|---|---|
| Origin | Apache Spark ecosystem | Netflix, now Apache |
| Metadata model | Ordered transaction log (_delta_log) of JSON commits plus Parquet checkpoints | Metadata tree: metadata.json to manifest list to manifest files |
| Data files | Parquet | Parquet (ORC and Avro also defined) |
| ACID transactions | Yes, optimistic concurrency | Yes, optimistic concurrency |
| Time travel | By version or timestamp | By snapshot or timestamp |
| Schema evolution | Column mapping by ID | Field IDs |
| Partitioning | Explicit, column-based (plus clustering) | Hidden partitioning with partition evolution |
| Row-level deletes | Deletion vectors (merge-on-read) | Copy-on-write or merge-on-read delete files |
| Multi-column clustering | Z-ORDER via OPTIMIZE | Sort order and partition spec |
| Catalog role | Log on storage is authoritative; catalog is a pointer | Catalog tracks the current metadata pointer |
| Interoperability | UniForm exposes Iceberg metadata from a Delta table | Read by many engines natively |
Which should you choose?
A short, honest decision guide. None of these is absolute; weight them against the engines you already operate.
- Lean Iceberg if
- You expect your partitioning scheme to change over time and value partition evolution; you want a vendor-neutral catalog spanning several query engines; or your chosen engines and managed services treat Iceberg as the first-class path.
- Lean Delta Lake if
- You are already invested in Spark or Databricks tooling; your tables see heavy row-level updates and deletes where deletion vectors shine; or multi-dimensional filtering makes Z-ORDER clustering worthwhile.
- It is mostly a tie if
- Your workload is append-mostly Parquet with occasional schema changes and history reads. In that case pick on ecosystem fit and operational familiarity, not on format features.
- Consider deferring the choice if
- You are not sure yet, or you have consumers on both sides. Interoperability (UniForm, dual-format engines) lets you keep one physical copy readable as both while you decide.
Using one engine for both formats
Because both Delta Lake and Iceberg are open formats over Parquet, an engine can support both, and that turns the "which format" question from a one-way door into something you can revisit. This is where it is worth being concrete about one such engine without overstating it.
DeltaForge is a commercial, customer-installed SQL engine (the table formats are open standards; the engine is the commercial part). It reads native Apache Iceberg tables by resolving metadata.json, walking the manifest list and manifests, and reading the Parquet data files, and it reads and writes Delta Lake natively with deletion vectors, time travel, change data feed, and OPTIMIZE / VACUUM / Z-ORDER. There is no Spark and no JVM involved. Both formats are driven through the same PostgreSQL-flavored SQL surface and the same optimizer, so the dialect you use does not change when the format does.
Creating a Delta table and an Iceberg-readable table are both ordinary SQL. A plain Delta table:
CREATE DELTA TABLE sales.public.orders (
order_id INT,
customer VARCHAR,
amount DOUBLE,
order_date DATE
) LOCATION 'orders';
LOCATION in your storage.The same Delta table can expose Iceberg metadata via UniForm by setting one family of table properties, so Iceberg readers see it without a second copy:
CREATE DELTA TABLE sales.public.events (
event_id INT,
event_type VARCHAR,
user_id INT,
event_ts TIMESTAMP
) LOCATION 'events'
TBLPROPERTIES (
'delta.universalFormat.enabledFormats' = 'iceberg',
'delta.universalFormat.icebergVersion' = '2'
);
metadata.json, manifest list and manifests alongside the Delta log, pointing at the same Parquet files.You can flip UniForm on for an existing Delta table the same way, with a single property change:
ALTER TABLE sales.public.events
SET TBLPROPERTIES ('delta.universalFormat.enabledFormats' = 'iceberg');
A table written by another Iceberg engine is registered and read in place, read-only, through its own metadata chain:
CREATE EXTERNAL TABLE sales.public.shipments
USING ICEBERG
LOCATION 'shipments';
And because both formats go through one engine, a single query can join across them; the optimizer pushes predicates into each format's own metadata layer:
SELECT o.order_id, o.amount, s.status
FROM sales.public.orders AS o
JOIN sales.public.shipments AS s
ON o.order_id = s.order_id;
Time travel reads the same way on either format, by version or timestamp:
SELECT COUNT(*) AS row_count
FROM sales.public.orders VERSION AS OF 1;
The point is not that an engine choice settles the format debate. It is that the format debate is less of a hard fork than it looks: the data is Parquet, both metadata layers are open, and tools exist to read and write both. Choosing well still matters, but choosing is reversible.
The honest bottom line
Delta Lake and Iceberg are more alike than the format wars imply. They both wrap Parquet in a transactional, time-travelled, schema-evolving table, and for a large class of workloads they are interchangeable. The differences that should drive a decision are specific: Iceberg's hidden partitioning, partition evolution and catalog-centric interoperability on one side; Delta's deletion vectors, Z-ORDER clustering and Spark-world maturity on the other. Match those to your engines and your mutation patterns, lean toward the ecosystem you already run, and remember that interoperability features mean the decision does not have to be permanent.