What does the write conformance plan cover?

Thousands of scripts across Delta Lake and Iceberg UniForm writes. Coverage spans core DML, schema evolution, layout features, and advanced features such as Change Data Feed, deletion vectors, identity columns, in-commit timestamps, and row tracking.

What does a passing write script verify?

Three independent verifications must all agree: visible row count, order-independent content hash, and schema hash. On failure, the record includes the exact row count expected vs actual, hash mismatch with sample diverging rows, schema diff at the field level, and the full reader exception.

Why use a content hash instead of comparing row sets?

Order-independent content hashing catches single-cell drift across millions of rows without materialising the full result set on both sides.

Does Iceberg UniForm produce real Iceberg tables Spark can read?

Yes. Iceberg UniForm tables written by DeltaForge expose a complete Iceberg metadata tree alongside the Delta log, and Spark reads return the same row count, content hash, and schema hash.

Can DeltaForge write Delta tables that use newer protocol features?

Yes. The write plan exercises protocol-gated features such as deletion vectors, Change Data Feed, row tracking, in-commit timestamps, and type widening, and each table must read back through Spark with matching row count, content hash, and schema hash.

Delta and Iceberg Writes Spark Can Read

Planned coverage

7,744 scripts across Delta Lake and Iceberg UniForm writes. Pass/fail counts appear after the first run.

Core DML

INSERT (167 scripts), UPDATE (177), DELETE (167), MERGE (317). Every type, every null pattern, with and without deletion vectors and CDF.

Schema evolution

ADD/DROP/RENAME/REORDER column (64 scripts), type widening (61), column mapping (57), generated columns (60), default values (82), CHECK constraints (48), nested struct evolution (43).

Performance features

Partitioning (117 scripts), Z-Order (110), statistics correctness (104), predicate pushdown (56). Layout choices that the reader must honour.

Advanced features

Change Data Feed (100 scripts), deletion vectors (76), identity columns (93), in-commit timestamps (77), row tracking (56). Time travel and RESTORE (173). VACUUM (71).

What "pass" means

A test only passes when all three independent verifications agree.

Logical row count. Visible rows after deletion-vector and snapshot resolution must match.
Order-independent content hash. Each row is hashed by value, hashes are XOR-combined, totals must match. Catches single-cell drift across millions of rows.
Schema hash. Column names, types, nullability, and field IDs must match. Catches column-mapping and type-widening drift.

On failure, the record includes exact row count expected vs actual, hash mismatch with sample diverging rows, schema diff at the field level, and the full reader exception.

# df-sql/01_basic_data_files.sql
CREATE DELTA TABLE basic_data_files (
    id BIGINT, order_number STRING, ...
) LOCATION '${TABLE_PATH}'
TBLPROPERTIES ('delta.enableDeletionVectors' = 'true');

INSERT INTO basic_data_files
WITH row_data AS (...)
SELECT ... FROM row_data;

# spark-reads-df/verify_01_basic_data_files.py
df = spark.read.format("delta").load(table_path)
assert df.count() == 372
assert content_hash(df) == expected_hash
assert schema_hash(df.schema) == expected_schema_hash

Write Conformance Test Plan