Apache Spark 4.0 writes a Delta Lake or native Iceberg table. DeltaForge reads it back. Each script asserts row count, specific cell values, and aggregates computed outside the engine before the test runs.
7,528 scripts across Delta Lake and native Iceberg reads. Pass/fail counts appear after the first run.
Parquet variants (Snappy, Zstd, Gzip), Delta log V1/V2, multipart checkpoints, Iceberg manifest formats V1/V2/V3.
All numeric types including high-precision decimal, temporal types (date, timestamp, timestamp-ntz, INT96 legacy, nanosecond V3), strings, binary, complex types to any nesting depth.
Deletion vectors (Roaring bitmap), column mapping (name and id mode), type widening, generated columns, identity columns, Iceberg position deletes, equality deletes, partition transforms.
Reads after INSERT, UPDATE, DELETE, and MERGE. VERSION AS OF and TIMESTAMP AS OF. Iceberg snapshot-id reads. Reads after RESTORE.
skip_cause: "spark_oss_limitation" so they are excluded from the executable pass-rate denominator.
Each verification script is hand-written with explicit expected values, not derived from engine output.
Expected values are derived from the deterministic generator formulas, so the test catches drift from any direction: reader bug, type-coercion bug, file-skipping bug, or generator regression.
# df-reads-spark/01_basic_data_files.sql
CREATE DELTA TABLE basic_data_files (
id BIGINT, order_number STRING, ...
) LOCATION '${TABLE_PATH}';
ASSERT ROW_COUNT = 372
SELECT * FROM basic_data_files;
ASSERT VALUE order_number = 'ORD10001' WHERE id = 1
SELECT id, order_number FROM basic_data_files;
ASSERT VALUE max_id = 372
SELECT MAX(id) AS max_id FROM basic_data_files;
Or it shows up as a failure on the conformance dashboard. No other outcomes.