Skip to content
Apache Iceberg

Native Apache Iceberg Support

First-class Iceberg table support built into the core engine. Not a bolt-on connector, not a third-party wrapper. Read, query, and manage Iceberg tables with the same SQL engine and the same performance you expect from Delta Lake.

Full V1/V2/V3 spec compliance
Same SQL engine for Delta and Iceberg
UniForm interoperability
{} metadata.json Table schema, partitions, snapshots V1 / V2 / V3 snap-00000-a1b2c3.avro Manifest List Points to manifest files for each snapshot added_files: 12 | deleted_files: 2 manifest-a1.avro partition: date=2024-06 files: 4 | rows: 2.1M manifest-b2.avro partition: date=2024-07 files: 6 | rows: 3.4M manifest-c3.avro partition: date=2024-08 files: 3 | rows: 1.8M data-001.parquet 524K rows data-002.parquet 612K rows data-003.parquet 498K rows data-004.parquet 701K rows data-005.parquet 389K rows Deletion Vector (V3) Bitmap {1,3,6} puffin-dv.bin LZ4 compressed CATALOG SNAPSHOTS MANIFESTS DATA FILES

Format Freedom

No vendor lock-in. Choose the table format that fits your workload.

Most data platforms force a choice: pick one table format and live with it. When they add Iceberg support, it arrives as a limited connector with gaps in spec coverage, missing delete support, or read-only access. Delta Forge takes a fundamentally different approach.

Iceberg Is a First-Class Citizen

The same SQL engine that powers Delta Lake queries also powers Iceberg queries. There is no secondary code path, no reduced feature set, no "lite" mode. You get the full capabilities of both formats in a single platform.

  • Read Iceberg tables natively alongside Delta Lake tables
  • Same SQL dialect for both formats, no mode switching
  • Full expression engine with predicate pushdown and column pruning
  • Zero friction when mixing formats in the same pipeline
Delta Lake Tables Iceberg Tables SQL Engine One engine, both formats

Full Specification Coverage

Complete V1/V2/V3 implementation with no gaps and no partial support

Iceberg V1

Foundation

  • Schema evolution with field IDs
  • Partition transforms (identity, bucket, truncate)
  • Time-based partitioning (year, month, day, hour)
  • Snapshot management and time travel
  • Sort orders
Iceberg V2

Row-Level Operations

  • Position deletes by file path and row
  • Equality deletes by column predicate
  • Sequence numbers for ordering
  • All V1 features inherited
Iceberg V3

Latest Capabilities

  • Deletion vectors
  • Nanosecond timestamps
  • Geospatial types (geometry, geography)
  • Variant semi-structured type
  • Puffin file format support
All Versions

Cross-Cutting

  • Full type system (all primitives + complex)
  • Nested struct, list, map evolution
  • Avro manifest I/O (Snappy, Zstd, BZip2)
  • Gzip metadata detection
  • Zero-copy Arrow conversion

UniForm: Write Once, Read Anywhere

The key differentiator. Write as Delta, automatically expose as Iceberg. No ETL, no data duplication.

UniForm eliminates the format debate entirely. A single physical table is simultaneously readable as both Delta Lake and Apache Iceberg. UniForm metadata is generated as part of the Delta transaction, not as a separate batch process, so Iceberg readers always see a consistent view.

How UniForm Works

  1. Delta Write - Data is written as standard Delta Lake Parquet files
  2. Transaction Commit - Delta log entry is committed atomically
  3. Iceberg Metadata - metadata.json, manifest list, and manifests generated inline
  4. Dual Access - Both Delta and Iceberg readers see the same data instantly

Why It Matters

  • No ETL between formats - teams using Spark, Trino, or Flink read the same tables without conversion
  • Gradual migration - adopt Iceberg incrementally without rewriting Delta pipelines
  • Vendor independence - any tool that speaks Iceberg can access your data
  • Zero data duplication - one copy of data, two metadata views
SQL Write Parquet Data Files Delta Log _delta_log/ Iceberg Metadata metadata/ Delta Readers Iceberg Readers UniForm

Cross-Format Joins

Query Delta Lake and Iceberg tables together in a single SQL statement

Delta Forge is not limited to one format at a time. Because both Delta Lake and Apache Iceberg are first-class citizens in the same SQL engine, you can join across formats in a single query with no ETL, no data movement, and no connectors to configure.

Why This Matters

  • No ETL between formats - combine Delta and Iceberg data on the fly without staging tables
  • Incremental adoption - teams migrating from one format to another can query both during the transition
  • Multi-team lakehouse - data engineering writes Delta, data science reads Iceberg, analytics joins both
  • Full predicate pushdown - filters are pushed into each format's native metadata layer independently
-- Join a Delta Lake table with an Iceberg table
SELECT d.customer_id, d.order_total, i.shipping_status
FROM delta.warehouse.orders d
JOIN iceberg.warehouse.shipments i
  ON d.order_id = i.order_id
WHERE d.order_date >= '2024-01-01';

Deletion Vectors and V3 Features

Production-grade row-level deletes with the latest Iceberg capabilities

Row-level deletes are where many Iceberg implementations fall short. Delta Forge supports the full spectrum of delete mechanisms defined in the specification, from position deletes in V2 through bitmap-based deletion vectors in V3.

Delete Mechanisms

  1. Position Deletes (V2+) - Delete specific rows by file path and row position
  2. Equality Deletes (V2+) - Delete rows matching column predicates without rewriting files
  3. Deletion Vectors (V3) - Compact bitmap storage: millions of positions in kilobytes

V3 Type System

  • Nanosecond timestamps for high-precision temporal data
  • Geospatial types with geometry, geography, and CRS support
  • Variant type for semi-structured data without schema
  • Puffin file format for deletion vectors with LZ4 compression
Parquet File
Row 0 Row 1 ✕ Row 2 Row 3 ✕ Row 4 Row 5 Row 6 ✕ Row 7
Deletion Vector
Bitmap {1, 3, 6} ~12 bytes

Schema Evolution

Evolve your schema without rewriting data files

Add Columns

Add new columns at any position. Existing files return NULL for new columns automatically.

ALTER TABLE t ADD COLUMN new_col STRING

Drop Columns

Remove columns from the schema. Underlying data remains until compaction.

ALTER TABLE t DROP COLUMN deprecated_col

Rename Columns

Rename columns using Iceberg's field ID tracking. Zero data movement required.

ALTER TABLE t RENAME COLUMN old_name TO new_name

Reorder Columns

Change column order for better organization without touching data files.

ALTER TABLE t ALTER COLUMN col FIRST

Type Widening

Widen column types safely (int to long, float to double, decimal precision).

ALTER TABLE t ALTER COLUMN amount TYPE DECIMAL(20,4)

Nested Evolution

Evolve struct, list, and map types at any nesting depth. Add fields to nested structures.

ALTER TABLE t ADD COLUMN address STRUCT<zip: STRING, city: STRING>

Time Travel and Snapshots

Query any historical state of your Iceberg tables

Version-Based Access

  • Query specific snapshot IDs
  • Compare data between snapshots
  • Restore to previous versions
  • Full snapshot chain traversal
SELECT * FROM iceberg.warehouse.events VERSION AS OF 5

Timestamp-Based Access

  • Query data as of a specific timestamp
  • Point-in-time recovery
  • Audit trail reconstruction
  • Compliance reporting
SELECT * FROM iceberg.warehouse.events TIMESTAMP AS OF '2024-06-15'

Branch and Tag Support

  • Named snapshot references
  • Branch-based isolation
  • Tag immutable snapshots
  • Safe experimentation
SELECT * FROM events.branch_staging

Snapshot Management

  • Snapshot expiration
  • Orphan file cleanup
  • Metadata log entries
  • Rollback operations
CALL rollback_to_snapshot('events', 5)

Data Skipping and Performance

Manifest-level and file-level statistics for efficient query planning

Iceberg's metadata hierarchy enables aggressive data skipping at multiple levels. Delta Forge leverages partition pruning, manifest filtering, and file-level column statistics to minimize I/O and maximize throughput.

Multi-Level Data Skipping

  1. Partition Pruning - Eliminate entire partitions from scan based on predicates
  2. Manifest Filtering - Skip manifests whose partition summaries do not match
  3. File-Level Statistics - Use min/max column bounds to skip individual data files
  4. Column Pruning - Read only the columns referenced in the query

Engine Optimizations

  • Zero-copy deserialization for metadata parsing
  • Multiple Avro codecs (Snappy, Zstd, BZip2) for manifest I/O
  • Automatic gzip detection for compressed metadata files
  • Shared type system eliminates redundant conversion between reader and writer
WHERE date = '2024-06' 2024-06 4 files 2024-07 SKIPPED 2024-08 SKIPPED file-01 SKIP file-03 file-04 3 of 13 files read 77% I/O eliminated

Open table formats. No vendor lock-in. Full interoperability.

Native Iceberg and Delta Lake support in a single engine. Your data, your format, your choice.