Skip to content
All conformance pages Direction 1

Write Conformance Test Plan

DeltaForge writes a Delta Lake or Iceberg UniForm table. Apache Spark 4.0 reads it back and verifies row count, content hash, and schema hash. Both engines must agree.

DeltaForge SQL writes Delta and Iceberg UniForm
Spark verifies row count, content hash, schema hash
7,700+ scripts across both formats
1. DeltaForge SQL writes the table CREATE DELTA TABLE ... INSERT ... MERGE ... VACUUM df-sql/ + df-sql-iceberg/ 2. Bytes on storage Delta Lake _delta_log/ + part-*.parquet deletion_vector_*.bin checkpoints Iceberg UniForm metadata/v1.metadata.json metadata/snap-*.avro data/*.parquet 3. Apache Spark 4.0 reads and verifies spark.read.format("delta") / spark.read.format("iceberg") spark-reads-df/ + spark-reads-iceberg/ row count + content hash + schema hash

Planned coverage

7,744 scripts across Delta Lake and Iceberg UniForm writes. Pass/fail counts appear after the first run.

Core DML

INSERT (167 scripts), UPDATE (177), DELETE (167), MERGE (317). Every type, every null pattern, with and without deletion vectors and CDF.

Schema evolution

ADD/DROP/RENAME/REORDER column (64 scripts), type widening (61), column mapping (57), generated columns (60), default values (82), CHECK constraints (48), nested struct evolution (43).

Performance features

Partitioning (117 scripts), Z-Order (110), statistics correctness (104), predicate pushdown (56). Layout choices that the reader must honour.

Advanced features

Change Data Feed (100 scripts), deletion vectors (76), identity columns (93), in-commit timestamps (77), row tracking (56). Time travel and RESTORE (173). VACUUM (71).

What "pass" means

A test only passes when all three independent verifications agree.

  1. Logical row count. Visible rows after deletion-vector and snapshot resolution must match.
  2. Order-independent content hash. Each row is hashed by value, hashes are XOR-combined, totals must match. Catches single-cell drift across millions of rows.
  3. Schema hash. Column names, types, nullability, and field IDs must match. Catches column-mapping and type-widening drift.

On failure, the record includes exact row count expected vs actual, hash mismatch with sample diverging rows, schema diff at the field level, and the full reader exception.

# df-sql/01_basic_data_files.sql
CREATE DELTA TABLE basic_data_files (
    id BIGINT, order_number STRING, ...
) LOCATION '${TABLE_PATH}'
TBLPROPERTIES ('delta.enableDeletionVectors' = 'true');

INSERT INTO basic_data_files
WITH row_data AS (...)
SELECT ... FROM row_data;

# spark-reads-df/verify_01_basic_data_files.py
df = spark.read.format("delta").load(table_path)
assert df.count() == 372
assert content_hash(df) == expected_hash
assert schema_hash(df.schema) == expected_schema_hash

If we wrote it, Spark can read it

Or it shows up as a failure on the conformance dashboard. No other outcomes.