What is the difference between Delta Lake and Apache Iceberg?

Both are open table formats that add a transactional metadata layer on top of Parquet data files, giving ACID writes, time travel, and schema evolution. The core difference is how each tracks table state. Delta Lake keeps an ordered transaction log directory, _delta_log, made of numbered JSON commits plus periodic Parquet checkpoints. Iceberg keeps a metadata tree: a metadata.json points to the current snapshot, the snapshot points to a manifest list, and the manifest list points to manifest files that finally list the data files. Beyond that, Iceberg is known for hidden partitioning and partition evolution, while Delta is known for deletion vectors and Z-ORDER clustering.

Is Iceberg better than Delta Lake?

Neither format is universally better; they are close in core capability and the right pick depends on your engines and operational needs. Iceberg's hidden partitioning and partition evolution suit teams that expect partitioning schemes to change over time and that want a vendor-neutral catalog story across many engines. Delta Lake's deletion vectors and Z-ORDER clustering suit high-mutation tables and multi-dimensional filtering, and its tooling is mature in the Spark and Databricks world. Because the data files are Parquet in both cases, the decision is mostly about the metadata layer and ecosystem fit, not raw scan speed.

Do you have to choose between Delta Lake and Iceberg?

Not necessarily. Both are open formats, and interoperability features exist to bridge them. Delta UniForm writes Iceberg metadata alongside the Delta transaction log on every commit, so one physical set of Parquet files is readable as both a Delta table and an Iceberg table with no second copy. Some engines also read and write both formats directly. DeltaForge, for example, reads native Iceberg tables and reads and writes Delta tables through one SQL surface, and emits Iceberg UniForm metadata on Delta commits, so you can defer or avoid a hard format commitment.

Can the same engine read both Delta Lake and Iceberg?

Yes. Several query engines support both formats. DeltaForge is one example: a commercial, customer-installed engine that reads native Apache Iceberg tables (metadata.json, manifest list, manifests, Parquet) and reads and writes Delta Lake natively, with no Spark and no JVM. The same PostgreSQL-flavored SQL, the same optimizer, and the same drivers cover both formats, including queries that join a Delta table and an Iceberg table together. The table formats themselves are open standards; the engine is the commercial part.

Delta Lake vs Iceberg: A Neutral Comparison

Delta Lake, Iceberg, Hudi, and Paimon under the hood, and why Iceberg won the multi-engine ecosystem.

Ask "Delta Lake or Iceberg?" in a planning meeting and you will usually get an answer shaped by whoever is in the room. A Databricks shop leans Delta; a team standardising on a vendor-neutral catalog leans Iceberg. Both reflexes are reasonable, and both skip the part that actually informs the decision: what each format is, mechanically, and where they genuinely diverge. This article walks the format internals first, then the shared capabilities, then the real differences, and only at the end touches on the option of not committing to one at all.

One framing to keep in mind throughout: a table format is not a file format. The rows on disk are Parquet in both Delta and Iceberg. What the table format adds is a metadata layer that turns a directory of Parquet files into a single, transactional, versioned table. So most of the comparison is about metadata design, not about how fast a column scans.

What is Delta Lake?

Delta Lake originated in the Apache Spark ecosystem and is defined by its transaction log. Every Delta table has a _delta_log directory next to its data files. Each committed transaction appends a new numbered JSON file (000000.json, 000001.json, and so on) that records the actions in that commit: which Parquet files were added, which were removed, schema or metadata changes, and protocol updates. To read the table, an engine replays the log from the start, applying adds and removes in order to arrive at the current file list. To keep that replay cheap as the log grows, Delta periodically writes a Parquet checkpoint that snapshots the state up to a given version, so a reader starts from the latest checkpoint and replays only the JSON commits after it.

That ordered, append-only log is the heart of the format. Atomicity comes from the commit being a single log file that either exists or does not; concurrency control is optimistic, detecting conflicts at commit time; and time travel falls out naturally, because version N is just the table state after replaying up to commit N. The data files themselves are Parquet, and column-level statistics carried in the log let the engine skip files that cannot match a predicate.

What is Apache Iceberg?

Apache Iceberg originated at Netflix and is defined by a metadata tree rather than a single log. At the top sits a metadata.json file that holds the table schema, the partition spec, and the list of snapshots. The current snapshot points to a manifest list (an Avro file), which points to one or more manifest files, and those manifests finally enumerate the Parquet data files along with per-file statistics and partition values. Reading a table means resolving the current metadata.json, walking the manifest list, then the manifests, then the data files.

The practical effect of that tree is that Iceberg tracks state by snapshot, and a commit produces a new metadata.json (and new manifest structures) that supersede the previous one. Iceberg leans on a catalog to record which metadata.json is current for a table, which is what makes atomic commits work across engines: the commit is the catalog atomically swapping the pointer to the new metadata file. Like Delta, Iceberg stores Parquet data files and carries column bounds for file skipping, and snapshots give it time travel.

What Delta Lake and Iceberg have in common

Before the differences, it is worth being clear about how much overlaps, because it is most of the surface area:

ACID transactions: Both give atomic, consistent commits with optimistic concurrency. A reader never sees a half-written commit in either format.
Time travel: Both let you query a prior state: by Delta version or timestamp, or by Iceberg snapshot or timestamp. Both keep older versions addressable until a cleanup operation removes them.
Schema evolution: Both support adding, renaming, reordering and dropping columns without rewriting existing data, using stable column identifiers (Delta column mapping by ID, Iceberg field IDs) so old files map correctly to the new schema.
Parquet data files: Both store rows as Parquet and rely on per-file column statistics (min, max, null counts) for data skipping. The scan performance floor is largely set by Parquet, not by the table format.
Maintenance operations: Both accumulate small files and stale metadata over time, and both have maintenance operations to compact files and expire old versions to reclaim storage.

If your workload is "land Parquet, query it transactionally, occasionally evolve the schema, occasionally read history," both formats do that well and you would have a hard time telling them apart from the query side.

Where Delta Lake and Iceberg actually differ

The genuine differences cluster in a few areas. None of them is a knockout; they are trade-offs.

Metadata model: ordered log vs metadata tree

This is the foundational difference. Delta's single ordered log is simple to reason about and naturally serialises commits, but a very long history means more JSON commits to replay between checkpoints, so log depth and checkpoint cadence become things you manage on busy tables. Iceberg's tree spreads metadata across the manifest list and manifests, which can prune large tables efficiently at the manifest level, but it adds more metadata objects and leans on a catalog to track the current pointer. Neither is strictly faster; they optimise for different shapes of growth.

Partitioning: Iceberg hidden partitioning and partition evolution

Iceberg's most distinctive feature is hidden partitioning. You declare a partition transform (for example, partition by month of a timestamp column) and Iceberg derives the partition value itself; queries filter on the natural column and the engine prunes partitions without the writer or reader hardcoding a separate partition column. Iceberg also supports partition evolution: you can change the partition spec of an existing table, and old data keeps its old layout while new data uses the new one, with no rewrite. Delta partitioning is column-based in the classic Hive style (and DeltaForge and others add clustering as an alternative), which is more explicit but less flexible if the partitioning scheme needs to change later.

Row-level deletes: deletion vectors and merge-on-read

How each format handles deletes and updates affects write amplification on high-mutation tables. The two strategies are copy-on-write (rewrite the data files that contain affected rows) and merge-on-read (record which rows are deleted and apply that at read time). Delta Lake implements merge-on-read through deletion vectors: a DELETE or MERGE records the positions of removed rows in a compact bitmap stored beside the data file, so the Parquet is not rewritten, and readers apply the bitmap during the scan. Iceberg supports both copy-on-write and merge-on-read modes, the latter through delete files, and lets you configure the strategy per operation. The upshot is similar (avoid rewriting large files for small changes) but the mechanisms and the knobs differ.

Clustering: Delta Z-ORDER

Delta Lake offers Z-ORDER clustering, which co-locates related values across multiple columns using a space-filling curve so that multi-column filter predicates skip more files. It is applied as part of an OPTIMIZE operation. Iceberg's data layout is driven more by its partitioning and sort-order configuration. Both are aiming at the same goal, fewer files read per query, by different routes.

Catalogs

Iceberg's design assumes a catalog as the authority for "which metadata file is current," and a whole catalog ecosystem (REST catalog, Hive, JDBC, and cloud-managed options) has grown around it, which is part of Iceberg's vendor-neutral appeal across many engines. Delta historically anchors on the log in storage as the source of truth, with the catalog as a pointer and a place for table metadata; the catalog matters but the log is authoritative. This shapes multi-engine stories: Iceberg's catalog-centric model is often cited for cross-engine interoperability, while Delta's log-centric model keeps the table self-describing on storage.

Engines and ecosystem

Delta Lake grew up inside Spark and is deeply integrated with the Spark and Databricks world, with a broad SQL surface there and a growing set of non-Spark readers and writers. Iceberg has cultivated a wide, vendor-neutral engine ecosystem from early on, with first-class support across many query engines and managed services. In practice, the engines you already run, or plan to run, are one of the strongest inputs to the decision, often stronger than any single format feature.

UniForm interoperability

Finally, the formats are not walled off from each other. Delta UniForm lets a Delta table also expose Iceberg metadata: on each Delta commit, the writer generates the corresponding Iceberg metadata.json, manifest list and manifests pointing at the same Parquet files, so one physical table is readable as both. That blurs the "pick one" framing, because a Delta-written table can be consumed by Iceberg readers without a second copy.

Delta Lake vs Iceberg at a glance

A side-by-side of the points above. Treat the "shared" rows as genuinely equivalent and focus your decision on the rows where they diverge.

Dimension	Delta Lake	Apache Iceberg
Origin	Apache Spark ecosystem	Netflix, now Apache
Metadata model	Ordered transaction log (`_delta_log`) of JSON commits plus Parquet checkpoints	Metadata tree: `metadata.json` to manifest list to manifest files
Data files	Parquet	Parquet (ORC and Avro also defined)
ACID transactions	Yes, optimistic concurrency	Yes, optimistic concurrency
Time travel	By version or timestamp	By snapshot or timestamp
Schema evolution	Column mapping by ID	Field IDs
Partitioning	Explicit, column-based (plus clustering)	Hidden partitioning with partition evolution
Row-level deletes	Deletion vectors (merge-on-read)	Copy-on-write or merge-on-read delete files
Multi-column clustering	Z-ORDER via OPTIMIZE	Sort order and partition spec
Catalog role	Log on storage is authoritative; catalog is a pointer	Catalog tracks the current metadata pointer
Interoperability	UniForm exposes Iceberg metadata from a Delta table	Read by many engines natively

The rows that should drive a decision are partitioning, deletes and clustering, catalog model, and engine ecosystem. The rest is close to a tie.

Which should you choose?

A short, honest decision guide. None of these is absolute; weight them against the engines you already operate.

Lean Iceberg if: You expect your partitioning scheme to change over time and value partition evolution; you want a vendor-neutral catalog spanning several query engines; or your chosen engines and managed services treat Iceberg as the first-class path.
Lean Delta Lake if: You are already invested in Spark or Databricks tooling; your tables see heavy row-level updates and deletes where deletion vectors shine; or multi-dimensional filtering makes Z-ORDER clustering worthwhile.
It is mostly a tie if: Your workload is append-mostly Parquet with occasional schema changes and history reads. In that case pick on ecosystem fit and operational familiarity, not on format features.
Consider deferring the choice if: You are not sure yet, or you have consumers on both sides. Interoperability (UniForm, dual-format engines) lets you keep one physical copy readable as both while you decide.

Using one engine for both formats

Because both Delta Lake and Iceberg are open formats over Parquet, an engine can support both, and that turns the "which format" question from a one-way door into something you can revisit. This is where it is worth being concrete about one such engine without overstating it.

DeltaForge is a commercial, customer-installed SQL engine (the table formats are open standards; the engine is the commercial part). It reads native Apache Iceberg tables by resolving metadata.json, walking the manifest list and manifests, and reading the Parquet data files, and it reads and writes Delta Lake natively with deletion vectors, time travel, change data feed, and OPTIMIZE / VACUUM / Z-ORDER. There is no Spark and no JVM involved. Both formats are driven through the same PostgreSQL-flavored SQL surface and the same optimizer, so the dialect you use does not change when the format does.

Creating a Delta table and an Iceberg-readable table are both ordinary SQL. A plain Delta table:

CREATE DELTA TABLE sales.public.orders (
    order_id   INT,
    customer   VARCHAR,
    amount     DOUBLE,
    order_date DATE
) LOCATION 'orders';

A native Delta table. The transaction log lands under the table's LOCATION in your storage.

The same Delta table can expose Iceberg metadata via UniForm by setting one family of table properties, so Iceberg readers see it without a second copy:

CREATE DELTA TABLE sales.public.events (
    event_id   INT,
    event_type VARCHAR,
    user_id    INT,
    event_ts   TIMESTAMP
) LOCATION 'events'
TBLPROPERTIES (
    'delta.universalFormat.enabledFormats' = 'iceberg',
    'delta.universalFormat.icebergVersion' = '2'
);

UniForm: every commit writes the Iceberg metadata.json, manifest list and manifests alongside the Delta log, pointing at the same Parquet files.

You can flip UniForm on for an existing Delta table the same way, with a single property change:

ALTER TABLE sales.public.events
SET TBLPROPERTIES ('delta.universalFormat.enabledFormats' = 'iceberg');

All subsequent commits produce both Delta and Iceberg metadata.

A table written by another Iceberg engine is registered and read in place, read-only, through its own metadata chain:

CREATE EXTERNAL TABLE sales.public.shipments
USING ICEBERG
LOCATION 'shipments';

The native Iceberg reader resolves the metadata tree and reads the Parquet data files with predicate pushdown and column pruning.

And because both formats go through one engine, a single query can join across them; the optimizer pushes predicates into each format's own metadata layer:

SELECT o.order_id, o.amount, s.status
FROM sales.public.orders AS o
JOIN sales.public.shipments AS s
    ON o.order_id = s.order_id;

A cross-format join: the Delta side and the Iceberg side are skipped independently, then joined.

Time travel reads the same way on either format, by version or timestamp:

SELECT COUNT(*) AS row_count
FROM sales.public.orders VERSION AS OF 1;

An earlier committed state stays addressable until a maintenance operation removes it.

The point is not that an engine choice settles the format debate. It is that the format debate is less of a hard fork than it looks: the data is Parquet, both metadata layers are open, and tools exist to read and write both. Choosing well still matters, but choosing is reversible.

The honest bottom line

Delta Lake and Iceberg are more alike than the format wars imply. They both wrap Parquet in a transactional, time-travelled, schema-evolving table, and for a large class of workloads they are interchangeable. The differences that should drive a decision are specific: Iceberg's hidden partitioning, partition evolution and catalog-centric interoperability on one side; Delta's deletion vectors, Z-ORDER clustering and Spark-world maturity on the other. Match those to your engines and your mutation patterns, lean toward the ecosystem you already run, and remember that interoperability features mean the decision does not have to be permanent.