If you searched the error text, you have probably already found the forum answers: "you ran VACUUM, the data is gone." That is true often enough to be unhelpful, because it skips the part you actually need: which of the two retention clocks expired, what is still recoverable, and how to configure the table so the next audit query does not die the same way. Let us do it properly.

The exact error: "Cannot time travel Delta table to version X"

Time travel to version X needs two independent things to still exist on storage:

  • The transaction log entries in _delta_log/ that let the reader reconstruct version X's snapshot (the JSON commits, or a checkpoint that covers them).
  • The Parquet data files that snapshot references. Old files stay on disk after UPDATE, DELETE, MERGE, and OPTIMIZE precisely so old versions remain readable, until something deletes them.

Each can disappear on its own schedule, which gives you two distinct errors for the same underlying problem:

-- Failure mode 1: the LOG entries are gone (log retention expired)
Cannot time travel Delta table to version 1. Available versions: [12, 47]

-- Failure mode 2: the log is intact but the DATA FILES are gone (VACUUM)
Error: referenced Parquet file not found:
  s3://acme-lake/demo/delta-time-travel-vacuum-boundary/activity_log/
  part-00000-8f3a...c91d.snappy.parquet
A file referenced in the transaction log cannot be found.
Two failure modes. Spark-based clients phrase them as VersionNotFoundException and FileNotFoundException; DeltaForge reports the missing version range or the exact missing file path.

Failure mode 1 means version X can no longer even be described: log cleanup removed its commit entries, so the reader only offers the versions it can still reconstruct. Failure mode 2 is sneakier. DESCRIBE HISTORY still lists version X, planning succeeds, and the query fails at scan time when it reaches for a file VACUUM deleted. The log remembers everything; the data only exists inside the retention window.

Why can't I time travel my Delta table to an old version?

Because one of two retention clocks expired, and they are configured separately:

delta.deletedFileRetentionDuration
Default interval 7 days. How long data files that the latest version no longer references must be kept before VACUUM is allowed to delete them. A VACUUM ... RETAIN n HOURS clause overrides this for that run, all the way down to zero.
delta.logRetentionDuration
Default interval 30 days. How long commit entries survive in _delta_log/. VACUUM never touches the log; log cleanup happens during checkpoint operations, on its own schedule.
Effective time travel window
The smaller of the two, because a readable version needs both its log entries and its data files. With default settings the files expire first (7 days vs 30 days), which is exactly why ghost history entries, versions you can list but not read, are the most common symptom in production.

The retention math is worth stating plainly. If your compliance team expects to re-run an audit query against "the table as of last quarter", both properties must cover that window. A 7-day file retention with a 30-day log retention does not give you 30 days of time travel; it gives you 7, plus 23 days of history entries that point at nothing.

Does VACUUM break Delta Lake time travel?

Yes, for every version older than the retention period you vacuum with. That is not a defect; reclaiming storage is the whole point of VACUUM, and the files it reclaims are the ones old snapshots read. The delta-time-travel-vacuum-boundary demo in the DeltaForge demo library reproduces the entire lifecycle on a small activity log, so you can watch the boundary move. Here is the flow, condensed.

  1. Build a table with real version history. The demo creates an activity log and pushes it through five versions: create (V0), an insert of 15 seed rows (V1), a duration correction via UPDATE (V2), a bounce-record DELETE (V3), and a second insert (V4). The block below shows the shape with a sample of the seed rows:

    CREATE ZONE IF NOT EXISTS external TYPE EXTERNAL
        COMMENT 'External and Delta tables for demo datasets';
    
    CREATE SCHEMA IF NOT EXISTS external.delta_demos
        COMMENT 'Delta table management tutorial demos';
    
    CREATE DELTA TABLE IF NOT EXISTS external.delta_demos.activity_log (
        user_id        INT,
        action         VARCHAR,
        page           VARCHAR,
        duration_secs  INT,
        logged_at      VARCHAR
    ) LOCATION 'delta-time-travel-vacuum-boundary/activity_log';
    
    INSERT INTO external.delta_demos.activity_log VALUES
        (1, 'view',     '/home',       3,  '2026-03-20 08:01:00'),
        (1, 'click',    '/products',   22, '2026-03-20 08:02:30'),
        (1, 'purchase', '/checkout',   45, '2026-03-20 08:05:00'),
        (2, 'view',     '/home',       7,  '2026-03-20 09:10:00'),
        (2, 'purchase', '/checkout',   52, '2026-03-20 09:15:00');
    
    UPDATE external.delta_demos.activity_log
    SET duration_secs = duration_secs + 10;
    
    DELETE FROM external.delta_demos.activity_log
    WHERE user_id IN (1, 2) AND action = 'view';
    Each DML statement is a new commit in _delta_log/ under s3://acme-lake/demo. The UPDATE and DELETE rewrite files; the old ones stay on disk, unreferenced but readable.
  2. Prove time travel works before VACUUM. Every version is queryable, and you can read the whole progression in one statement:

    SELECT COUNT(*) AS v1_count
    FROM external.delta_demos.activity_log VERSION AS OF 1;
    
    SELECT 'V1' AS version,
           (SELECT COUNT(*) FROM external.delta_demos.activity_log VERSION AS OF 1) AS rows
    UNION ALL
    SELECT 'V2',
           (SELECT COUNT(*) FROM external.delta_demos.activity_log VERSION AS OF 2)
    UNION ALL
    SELECT 'V3',
           (SELECT COUNT(*) FROM external.delta_demos.activity_log VERSION AS OF 3)
    UNION ALL
    SELECT 'V4',
           (SELECT COUNT(*) FROM external.delta_demos.activity_log VERSION AS OF 4);
    In the full demo this returns 15, 15, 13, 18: the insert, the in-place correction, the delete, the second insert. The demo's ASSERT checks pin every one of these counts.
  3. Cross the boundary. RETAIN 0 HOURS is the most aggressive setting and deletes every file the latest version does not reference. Always preview first with DRY RUN, which lists the files without deleting anything:

    VACUUM external.delta_demos.activity_log RETAIN 0 HOURS DRY RUN;
    
    VACUUM external.delta_demos.activity_log RETAIN 0 HOURS;
    The real run returns files_deleted and bytes_freed. After it, the Parquet files behind V0 through V3 are gone from storage.
  4. Observe the ghost entries. The latest version survives every VACUUM, so current queries and VERSION AS OF 4 still return all 18 rows. But the history now over-promises:

    SELECT COUNT(*) AS v4_count
    FROM external.delta_demos.activity_log VERSION AS OF 4;
    
    DESCRIBE HISTORY external.delta_demos.activity_log;
    DESCRIBE HISTORY still lists all five versions with their operations and metrics. Reading V1 through V3 now fails at scan time: the log says the version happened, the files to prove it are gone.

That last step is the answer to a question that confuses nearly everyone the first time: VACUUM does not shrink DESCRIBE HISTORY, because it never deletes log files. History length is governed by delta.logRetentionDuration and trimmed during checkpoints. So "history shows the version" and "the version is readable" are simply different claims.

What is the difference between log retention and file retention?

The kv table above gives the defaults; the operational rule is: set both properties to cover your longest audit or rollback window, and never vacuum with a shorter RETAIN than that window. Both are ordinary table properties, set with a metadata-only commit:

ALTER TABLE external.delta_demos.activity_log SET TBLPROPERTIES (
    'delta.deletedFileRetentionDuration' = 'interval 30 days',
    'delta.logRetentionDuration' = 'interval 60 days'
);
Thirty days of guaranteed file survival, sixty days of log history. From this point, a default VACUUM respects the 30-day floor. An explicit RETAIN clause can still override it, so guard who may run VACUUM on audited tables.

Two adjacent facts worth knowing. First, if the table has Change Data Feed enabled, the files under _change_data/ follow the log retention by default, not the file retention; DeltaForge's VACUUM accepts a separate CDC_RETAIN n HOURS clause to control that window independently. Second, setting retention low does not delete anything by itself; files only leave storage when a VACUUM (or auto-vacuum, where enabled) actually runs.

How do you restore a Delta table to a previous version with SQL?

When the bad version is the current one (an accidental DELETE, a wrong UPDATE, a corrupt load) you do not want time travel reads, you want a rollback. RESTORE rewinds the table by writing a new commit whose actions recreate the target version's snapshot. Nothing is erased: the history stays append-only, and the mistake itself remains inspectable.

The delta-restore-rollback demo plays this out on a 30-product inventory table. Prices get raised, five slow movers get marked discontinued, and then an operator runs the cleanup query that should have been an archive:

-- The accident: permanently removes 5 rows we still needed
DELETE FROM external.delta_demos.rollback_product_inventory
WHERE status = 'discontinued';

-- The recovery: rewind to the snapshot before the delete
RESTORE external.delta_demos.rollback_product_inventory TO VERSION 3;

-- Verify: all 30 rows are back, statuses intact
SELECT COUNT(*) AS total_products,
       COUNT(*) FILTER (WHERE status = 'active') AS active_count,
       COUNT(*) FILTER (WHERE status = 'discontinued') AS discontinued_count
FROM external.delta_demos.rollback_product_inventory;

DESCRIBE HISTORY external.delta_demos.rollback_product_inventory LIMIT 10;
The verification returns 30 total, 25 active, 5 discontinued: the pre-delete state, with the earlier price changes preserved. The history shows the delete, then the restore, as two ordinary commits.

Because RESTORE is just another commit, it is itself reversible: if you restore to the wrong version, restore again. The demo continues past the recovery, reactivating the recovered items with a clearance discount, to make the point that the table is fully live after a rollback.

One honest caveat connects this section to everything above: RESTORE reads the target version's data files, so it is subject to the same VACUUM boundary as time travel. You can restore to any version whose files still exist; you cannot restore across a boundary that VACUUM has already crossed. Rollback windows are a retention decision you make before the accident, not after.

On tooling: the open-source Delta Lake documentation exposes RESTORE through Spark SQL or the DeltaTable Python and Scala APIs, which assumes a Spark runtime is available when the incident happens. DeltaForge implements the same Delta protocol operation as a plain SQL statement against the table on object storage. DeltaForge is commercial, customer-installed software: the engine runs on your infrastructure, speaks PostgreSQL-flavored SQL, and needs no cluster to execute a rollback. The SQL engine page covers the full surface.

When the version is already gone

If VACUUM deleted the files and you need that data, no Delta command brings it back; the deletion was a physical object-store delete. The remaining options live outside the table:

  • Object-store versioning or soft delete. If S3 bucket versioning, Azure soft delete, or GCS object versioning was enabled on s3://acme-lake/demo, the deleted Parquet objects may be recoverable at the storage layer. Restore them to their original paths and the old snapshots become readable again.
  • Upstream replay. If the table is built by a pipeline from durable sources, rebuild the historical state by replaying the loads.
  • Independent backups. A snapshot taken by your backup tooling is unaffected by VACUUM.

And the prevention list, in priority order: set delta.deletedFileRetentionDuration to your real audit window; keep delta.logRetentionDuration at least as long; run VACUUM ... DRY RUN before any manual vacuum; and treat RETAIN 0 HOURS as a demolition tool that also risks concurrent readers, not a default.

Both demos referenced here, delta-time-travel-vacuum-boundary and delta-restore-rollback, ship in the DeltaForge demo library with automated ASSERT verification, so you can run the whole lifecycle against your own object store and watch every number land.