Skip to content
Zero-Exposure Data Protection

Raw Data Never Reaches Your Lake

Delta Forge pseudonymises data in-memory during read, before it ever touches your data lake. Click a column, choose an algorithm, and every write to Delta format is already protected. No unprotected data. No post-hoc masking. No exposure window.

Why This Changes Everything

Traditional tools mask data after it's already in your lake. Delta Forge never lets it get there unprotected

The Problem With Everyone Else

  • Raw data lands in the lake first, then gets masked
  • An exposure window exists between ingest and masking
  • Masking pipelines can fail silently
  • Compliance audits question the gap
  • Complex ETL chains just to protect data

The Delta Forge Approach

  • Transforms happen in-memory during read, before any write
  • Data written to Delta format is already pseudonymised
  • Zero exposure window: raw data never persists
  • Click a column, choose an algorithm - that's it
  • No separate masking pipeline, no post-hoc processing

How It Works

Three clicks to protect any column, applied in-memory, every time

1. Select a Column

Browse your table's columns and click the one containing sensitive data: names, emails, SSNs, patient IDs, anything.

2. Choose an Algorithm

Pick from five production-ready transforms: hash, encrypt, redact, generalize, or tokenize. Set scope and parameters.

3. Protected on Every Read

From this moment, every read transforms the data in-memory. What gets written to Delta format is already pseudonymised. Raw data never leaves the source.

See It In Action

Watch how data transforms in real-time

Input
email: john.doe@example.com
ssn: 123-45-6789
age: 47
keyed_hash
Output
email: a3f8c2...e91b
ssn: 7d4e1a...c82f
age: b29c8f...1a3e

Five Production-Ready Transforms

Purpose-built transforms for every data protection scenario

keyed_hash

Irreversible HMAC-SHA256 or Argon2 hashing. Perfect for patient IDs, SSNs, and identifiers that never need recovery.

encrypt

Reversible AES-256-SIV deterministic encryption. Same plaintext produces same ciphertext, enabling joins on encrypted values.

redact

Display masking for emails, phones, credit cards, and SSNs. Show partial context while protecting sensitive data.

generalize

Reduce precision for k-anonymity. Convert dates to years, ages to ranges, zip codes to regions.

tokenize

Generate consistent UUID tokens for entity linking. Same value always produces same token within scope.

Smart Redaction Modes

Purpose-built masking for every data type

full

"Sensitive data" → "[REDACTED]"

partial

"Jonathan" → "J******n"

email

"john.doe@example.com" → "j***@example.com"

phone

"555-123-4567" → "***-***-4567"

card

"4111-1111-1111-1234" → "****-****-****-1234"

ssn

"123-45-6789" → "***-**-6789"

Generalization for K-Anonymity

Reduce data precision while maintaining analytical utility

year

"1985-06-15" → "1985"

month

"1985-06-15" → "1985-06"

quarter

"1985-06-15" → "1985Q2"

decade

"1985-06-15" → "1980s"

age_range

47 → "40-49"

zip3

"90210" → "902XX"

zip2

"90210" → "90XXX"

round

52,750 → 53,000

Linkability Scopes

Control how pseudonyms relate across records

Transaction Scope

  • Unique pseudonym per record
  • No linkability between rows
  • Maximum privacy protection
  • Use case: One-time identifiers

Relationship Scope

  • Consistent within context/session
  • Same value → same pseudonym in session
  • Cross-session values differ
  • Use case: Session analytics

Person Scope

  • Globally consistent pseudonyms
  • Same value → same output everywhere
  • Full linkability across dataset
  • Use case: Longitudinal studies

SQL-Native Commands

Manage pseudonymisation rules with familiar SQL syntax

-- Create a runtime transformation rule
CREATE PSEUDONYMISATION RULE ON healthcare.patients (ssn)
TRANSFORM keyed_hash
SCOPE person
PRIORITY 10;

-- Apply transformation to existing data permanently
APPLY PSEUDONYMISATION ON healthcare.patients (email)
TRANSFORM redact
PARAMS (mode = 'email')
WHERE status = 'inactive';

-- View active rules
SHOW PSEUDONYMISATION RULES FOR healthcare.patients;

-- Enable/disable rules dynamically
ALTER PSEUDONYMISATION RULE ON healthcare.patients (ssn)
SET DISABLED;

Cache-First Performance

Vectorized operations with intelligent caching

Cache Hit

~100-500ns per value. Two-level caching (global + scoped) ensures consistency and speed.

HMAC-SHA256

~1-2µs per value. Production-ready cryptographic hashing with minimal overhead.

AES-SIV Encrypt

~5-10µs per value. Nonce-misuse resistant authenticated encryption.

Argon2 (Memory-Hard)

~50-100µs per value. Brute-force resistant hashing for highest security.

Flexible Key Management

From development to production, secure key storage for every environment

In-Memory (Development)

  • Auto-generate secure random keys
  • Perfect for local development
  • Zero configuration required
  • Keys reset on restart

Environment Variables

  • Production-ready pattern
  • PSEUDONYM_KEY_{ID} format
  • Optional salt configuration
  • Container-friendly

Cloud Key Management

  • AWS Secrets Manager
  • Azure Key Vault
  • Google Secret Manager
  • HashiCorp Vault

Custom Key Store

  • Implement KeyStore trait
  • HSM integration support
  • Custom rotation policies
  • Versioned key management

Universal Compatibility

Works with any supported data source

Delta Lake

Wrap delta-rs tables transparently

Parquet

Native columnar file support

CSV / JSON

Text-based file formats

PostgreSQL

Database connector support

SQL Server

Enterprise database integration

Custom Sources

Any supported data source

GDPR Compliance

Built for European data protection requirements

Article 4(5) Definition

Implements pseudonymisation as defined: "processing personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information."

Article 25

Data protection by design and by default. Pseudonymisation applied automatically during query execution without manual intervention.

Article 32

Security of processing through encryption and pseudonymisation. Cryptographic transforms for robust protection.

EDPB Guidelines

Aligned with European Data Protection Board Guidelines 01/2025. Scoped pseudonymisation for controlled linkability.

In-Memory Transform Architecture

Data is pseudonymised in-memory during read. Your data lake only ever sees protected values

SQL Query
SELECT * FROM patients
Pseudonymisation Layer
Inner Provider
DeltaParquetCSV
Rules
Transforms
KeyStore
ValueCache
Pseudonymised Results
Schema unchanged, values transformed
View Text Diagram
┌─────────────────────────────────────────────────────────────────┐
│                       SQL Query                                  │
│              SELECT * FROM patients WHERE ...                    │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             v
┌─────────────────────────────────────────────────────────────────┐
│              Pseudonymisation Layer                               │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  Data Source (Delta, Parquet, CSV, Database, ...)       │    │
│  └─────────────────────────────────────────────────────────┘    │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐    │
│  │ CompiledRules│ │TransformReg. │ │     KeyStore         │    │
│  └──────────────┘ └──────────────┘ └──────────────────────┘    │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │              ValueCache (scope + global)                  │   │
│  └──────────────────────────────────────────────────────────┘   │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             v
┌─────────────────────────────────────────────────────────────────┐
│              Pseudonymised Results                                │
│         (Schema unchanged, values transformed)                   │
└─────────────────────────────────────────────────────────────────┘

Cryptographic Security

Industry-standard algorithms for production deployments

Hashing

  • HMAC-SHA256 (256-bit keys)
  • Argon2id memory-hard hashing
  • Configurable work factors

Encryption

  • AES-256-SIV (512-bit total)
  • Authenticated encryption
  • Nonce-misuse resistant

Key Derivation

  • HKDF-SHA256
  • Scope-aware key derivation
  • Hierarchical key structures

Caching

  • XxHash3 for cache keys
  • Thread-safe DashMap storage
  • Configurable cache limits

Stop exposing raw data in your lake

In-memory pseudonymisation on read. Click a column, choose an algorithm. Your data lake never sees unprotected values.