What does the read conformance plan cover?

Thousands of hand-written read scripts across Delta Lake and native Apache Iceberg. Coverage spans file and metadata layouts, the full type system, deletion vectors, column mapping, type widening, generated and identity columns, Iceberg position and equality deletes, and partition transforms.

What does a passing read script verify?

Every script asserts the exact row count after read, specific cell values, aggregates (MIN, MAX, SUM, COUNT DISTINCT) against known totals, and schema shape (column names, types, nullability, field IDs).

Why are some read scripts skipped?

A subset of read scripts cannot run because Spark OSS does not implement the matching write-path feature. Without the writer producing the table, there is nothing to read against.

Does the plan cover time travel?

Yes. Reads after INSERT, UPDATE, DELETE, and MERGE; VERSION AS OF and TIMESTAMP AS OF; Iceberg snapshot-id reads; and reads after RESTORE all have dedicated scripts with explicit expected values.

Can DeltaForge read Spark-written Delta tables with deletion vectors and column mapping?

Yes. Dedicated read scripts cover deletion vectors (Roaring bitmap), column mapping in both name and id mode, and type widening, each asserting exact row counts, specific cell values, and aggregates against expected values computed outside the engine.

Read Spark-Written Delta and Iceberg Tables

Planned coverage

7,528 scripts across Delta Lake and native Iceberg reads. Pass/fail counts appear after the first run.

File and metadata layouts

Parquet variants (Snappy, Zstd, Gzip), Delta log V1/V2, multipart checkpoints, Iceberg manifest formats V1/V2/V3.

Type system

All numeric types including high-precision decimal, temporal types (date, timestamp, timestamp-ntz, INT96 legacy, nanosecond V3), strings, binary, complex types to any nesting depth.

Spark-specific features

Deletion vectors (Roaring bitmap), column mapping (name and id mode), type widening, generated columns, identity columns, Iceberg position deletes, equality deletes, partition transforms.

DML and time travel

Reads after INSERT, UPDATE, DELETE, and MERGE. VERSION AS OF and TIMESTAMP AS OF. Iceberg snapshot-id reads. Reads after RESTORE.

Why some scripts are skipped. A subset of read scripts cannot run because Spark OSS does not implement the matching write-path feature. Examples include row tracking, identity columns, and in-commit timestamps. Without the writer producing the table, there is nothing to read against. These rows are tagged skip_cause: "spark_oss_limitation" so they are excluded from the executable pass-rate denominator.

What "pass" means

Each verification script is hand-written with explicit expected values, not derived from engine output.

ROW_COUNT. Every script asserts the exact row count after reading.
VALUE. Specific cells: "the value of order_number for id = 1 must be ORD10001". Catches per-cell drift.
Aggregates. MIN, MAX, SUM, COUNT DISTINCT against known totals. Catches statistical drift that single-cell checks would miss.
Schema shape. Column names, types, nullability, and field IDs verified on every script.

Expected values are derived from the deterministic generator formulas, so the test catches drift from any direction: reader bug, type-coercion bug, file-skipping bug, or generator regression.

# df-reads-spark/01_basic_data_files.sql
CREATE DELTA TABLE basic_data_files (
    id BIGINT, order_number STRING, ...
) LOCATION '${TABLE_PATH}';

ASSERT ROW_COUNT = 372
SELECT * FROM basic_data_files;

ASSERT VALUE order_number = 'ORD10001' WHERE id = 1
SELECT id, order_number FROM basic_data_files;

ASSERT VALUE max_id = 372
SELECT MAX(id) AS max_id FROM basic_data_files;

Read Conformance Test Plan