If you searched the error text, you have probably already found the forum answers: "you ran VACUUM, the data is gone." That is true often enough to be unhelpful, because it skips the part you actually need: which of the two retention clocks expired, what is still recoverable, and how to configure the table so the next audit query does not die the same way. Let us do it properly.
The exact error: "Cannot time travel Delta table to version X"
Time travel to version X needs two independent things to still exist on storage:
- The transaction log entries in
_delta_log/that let the reader reconstruct version X's snapshot (the JSON commits, or a checkpoint that covers them). - The Parquet data files that snapshot references. Old files stay on disk after
UPDATE,DELETE,MERGE, andOPTIMIZEprecisely so old versions remain readable, until something deletes them.
Each can disappear on its own schedule, which gives you two distinct errors for the same underlying problem:
-- Failure mode 1: the LOG entries are gone (log retention expired)
Cannot time travel Delta table to version 1. Available versions: [12, 47]
-- Failure mode 2: the log is intact but the DATA FILES are gone (VACUUM)
Error: referenced Parquet file not found:
s3://acme-lake/demo/delta-time-travel-vacuum-boundary/activity_log/
part-00000-8f3a...c91d.snappy.parquet
A file referenced in the transaction log cannot be found.
VersionNotFoundException and FileNotFoundException; DeltaForge reports the missing version range or the exact missing file path.Failure mode 1 means version X can no longer even be described: log cleanup removed its commit entries, so the reader only offers the versions it can still reconstruct. Failure mode 2 is sneakier. DESCRIBE HISTORY still lists version X, planning succeeds, and the query fails at scan time when it reaches for a file VACUUM deleted. The log remembers everything; the data only exists inside the retention window.
Why can't I time travel my Delta table to an old version?
Because one of two retention clocks expired, and they are configured separately:
delta.deletedFileRetentionDuration- Default
interval 7 days. How long data files that the latest version no longer references must be kept before VACUUM is allowed to delete them. AVACUUM ... RETAIN n HOURSclause overrides this for that run, all the way down to zero. delta.logRetentionDuration- Default
interval 30 days. How long commit entries survive in_delta_log/. VACUUM never touches the log; log cleanup happens during checkpoint operations, on its own schedule. - Effective time travel window
- The smaller of the two, because a readable version needs both its log entries and its data files. With default settings the files expire first (7 days vs 30 days), which is exactly why ghost history entries, versions you can list but not read, are the most common symptom in production.
The retention math is worth stating plainly. If your compliance team expects to re-run an audit query against "the table as of last quarter", both properties must cover that window. A 7-day file retention with a 30-day log retention does not give you 30 days of time travel; it gives you 7, plus 23 days of history entries that point at nothing.
Does VACUUM break Delta Lake time travel?
Yes, for every version older than the retention period you vacuum with. That is not a defect; reclaiming storage is the whole point of VACUUM, and the files it reclaims are the ones old snapshots read. The delta-time-travel-vacuum-boundary demo in the DeltaForge demo library reproduces the entire lifecycle on a small activity log, so you can watch the boundary move. Here is the flow, condensed.
-
Build a table with real version history. The demo creates an activity log and pushes it through five versions: create (V0), an insert of 15 seed rows (V1), a duration correction via
UPDATE(V2), a bounce-recordDELETE(V3), and a second insert (V4). The block below shows the shape with a sample of the seed rows:CREATE ZONE IF NOT EXISTS external TYPE EXTERNAL COMMENT 'External and Delta tables for demo datasets'; CREATE SCHEMA IF NOT EXISTS external.delta_demos COMMENT 'Delta table management tutorial demos'; CREATE DELTA TABLE IF NOT EXISTS external.delta_demos.activity_log ( user_id INT, action VARCHAR, page VARCHAR, duration_secs INT, logged_at VARCHAR ) LOCATION 'delta-time-travel-vacuum-boundary/activity_log'; INSERT INTO external.delta_demos.activity_log VALUES (1, 'view', '/home', 3, '2026-03-20 08:01:00'), (1, 'click', '/products', 22, '2026-03-20 08:02:30'), (1, 'purchase', '/checkout', 45, '2026-03-20 08:05:00'), (2, 'view', '/home', 7, '2026-03-20 09:10:00'), (2, 'purchase', '/checkout', 52, '2026-03-20 09:15:00'); UPDATE external.delta_demos.activity_log SET duration_secs = duration_secs + 10; DELETE FROM external.delta_demos.activity_log WHERE user_id IN (1, 2) AND action = 'view';Each DML statement is a new commit in _delta_log/unders3://acme-lake/demo. The UPDATE and DELETE rewrite files; the old ones stay on disk, unreferenced but readable. -
Prove time travel works before VACUUM. Every version is queryable, and you can read the whole progression in one statement:
SELECT COUNT(*) AS v1_count FROM external.delta_demos.activity_log VERSION AS OF 1; SELECT 'V1' AS version, (SELECT COUNT(*) FROM external.delta_demos.activity_log VERSION AS OF 1) AS rows UNION ALL SELECT 'V2', (SELECT COUNT(*) FROM external.delta_demos.activity_log VERSION AS OF 2) UNION ALL SELECT 'V3', (SELECT COUNT(*) FROM external.delta_demos.activity_log VERSION AS OF 3) UNION ALL SELECT 'V4', (SELECT COUNT(*) FROM external.delta_demos.activity_log VERSION AS OF 4);In the full demo this returns 15, 15, 13, 18: the insert, the in-place correction, the delete, the second insert. The demo's ASSERT checks pin every one of these counts. -
Cross the boundary.
RETAIN 0 HOURSis the most aggressive setting and deletes every file the latest version does not reference. Always preview first withDRY RUN, which lists the files without deleting anything:VACUUM external.delta_demos.activity_log RETAIN 0 HOURS DRY RUN; VACUUM external.delta_demos.activity_log RETAIN 0 HOURS;The real run returns files_deletedandbytes_freed. After it, the Parquet files behind V0 through V3 are gone from storage. -
Observe the ghost entries. The latest version survives every VACUUM, so current queries and
VERSION AS OF 4still return all 18 rows. But the history now over-promises:SELECT COUNT(*) AS v4_count FROM external.delta_demos.activity_log VERSION AS OF 4; DESCRIBE HISTORY external.delta_demos.activity_log;DESCRIBE HISTORYstill lists all five versions with their operations and metrics. Reading V1 through V3 now fails at scan time: the log says the version happened, the files to prove it are gone.
That last step is the answer to a question that confuses nearly everyone the first time: VACUUM does not shrink DESCRIBE HISTORY, because it never deletes log files. History length is governed by delta.logRetentionDuration and trimmed during checkpoints. So "history shows the version" and "the version is readable" are simply different claims.
What is the difference between log retention and file retention?
The kv table above gives the defaults; the operational rule is: set both properties to cover your longest audit or rollback window, and never vacuum with a shorter RETAIN than that window. Both are ordinary table properties, set with a metadata-only commit:
ALTER TABLE external.delta_demos.activity_log SET TBLPROPERTIES (
'delta.deletedFileRetentionDuration' = 'interval 30 days',
'delta.logRetentionDuration' = 'interval 60 days'
);
RETAIN clause can still override it, so guard who may run VACUUM on audited tables.Two adjacent facts worth knowing. First, if the table has Change Data Feed enabled, the files under _change_data/ follow the log retention by default, not the file retention; DeltaForge's VACUUM accepts a separate CDC_RETAIN n HOURS clause to control that window independently. Second, setting retention low does not delete anything by itself; files only leave storage when a VACUUM (or auto-vacuum, where enabled) actually runs.
How do you restore a Delta table to a previous version with SQL?
When the bad version is the current one (an accidental DELETE, a wrong UPDATE, a corrupt load) you do not want time travel reads, you want a rollback. RESTORE rewinds the table by writing a new commit whose actions recreate the target version's snapshot. Nothing is erased: the history stays append-only, and the mistake itself remains inspectable.
The delta-restore-rollback demo plays this out on a 30-product inventory table. Prices get raised, five slow movers get marked discontinued, and then an operator runs the cleanup query that should have been an archive:
-- The accident: permanently removes 5 rows we still needed
DELETE FROM external.delta_demos.rollback_product_inventory
WHERE status = 'discontinued';
-- The recovery: rewind to the snapshot before the delete
RESTORE external.delta_demos.rollback_product_inventory TO VERSION 3;
-- Verify: all 30 rows are back, statuses intact
SELECT COUNT(*) AS total_products,
COUNT(*) FILTER (WHERE status = 'active') AS active_count,
COUNT(*) FILTER (WHERE status = 'discontinued') AS discontinued_count
FROM external.delta_demos.rollback_product_inventory;
DESCRIBE HISTORY external.delta_demos.rollback_product_inventory LIMIT 10;
Because RESTORE is just another commit, it is itself reversible: if you restore to the wrong version, restore again. The demo continues past the recovery, reactivating the recovered items with a clearance discount, to make the point that the table is fully live after a rollback.
One honest caveat connects this section to everything above: RESTORE reads the target version's data files, so it is subject to the same VACUUM boundary as time travel. You can restore to any version whose files still exist; you cannot restore across a boundary that VACUUM has already crossed. Rollback windows are a retention decision you make before the accident, not after.
On tooling: the open-source Delta Lake documentation exposes RESTORE through Spark SQL or the DeltaTable Python and Scala APIs, which assumes a Spark runtime is available when the incident happens. DeltaForge implements the same Delta protocol operation as a plain SQL statement against the table on object storage. DeltaForge is commercial, customer-installed software: the engine runs on your infrastructure, speaks PostgreSQL-flavored SQL, and needs no cluster to execute a rollback. The SQL engine page covers the full surface.
When the version is already gone
If VACUUM deleted the files and you need that data, no Delta command brings it back; the deletion was a physical object-store delete. The remaining options live outside the table:
- Object-store versioning or soft delete. If S3 bucket versioning, Azure soft delete, or GCS object versioning was enabled on
s3://acme-lake/demo, the deleted Parquet objects may be recoverable at the storage layer. Restore them to their original paths and the old snapshots become readable again. - Upstream replay. If the table is built by a pipeline from durable sources, rebuild the historical state by replaying the loads.
- Independent backups. A snapshot taken by your backup tooling is unaffected by VACUUM.
And the prevention list, in priority order: set delta.deletedFileRetentionDuration to your real audit window; keep delta.logRetentionDuration at least as long; run VACUUM ... DRY RUN before any manual vacuum; and treat RETAIN 0 HOURS as a demolition tool that also risks concurrent readers, not a default.
Both demos referenced here, delta-time-travel-vacuum-boundary and delta-restore-rollback, ship in the DeltaForge demo library with automated ASSERT verification, so you can run the whole lifecycle against your own object store and watch every number land.