Delta Forge pseudonymises data in-memory during read, before it ever touches your data lake. Click a column, choose an algorithm, and every write to Delta format is already protected. No unprotected data. No post-hoc masking. No exposure window.
Traditional tools mask data after it's already in your lake. Delta Forge never lets it get there unprotected
Three clicks to protect any column, applied in-memory, every time
Browse your table's columns and click the one containing sensitive data: names, emails, SSNs, patient IDs, anything.
Pick from five production-ready transforms: hash, encrypt, redact, generalize, or tokenize. Set scope and parameters.
From this moment, every read transforms the data in-memory. What gets written to Delta format is already pseudonymised. Raw data never leaves the source.
Watch how data transforms in real-time
Purpose-built transforms for every data protection scenario
Irreversible HMAC-SHA256 or Argon2 hashing. Perfect for patient IDs, SSNs, and identifiers that never need recovery.
Reversible AES-256-SIV deterministic encryption. Same plaintext produces same ciphertext, enabling joins on encrypted values.
Display masking for emails, phones, credit cards, and SSNs. Show partial context while protecting sensitive data.
Reduce precision for k-anonymity. Convert dates to years, ages to ranges, zip codes to regions.
Generate consistent UUID tokens for entity linking. Same value always produces same token within scope.
Purpose-built masking for every data type
full
"Sensitive data" → "[REDACTED]"
partial
"Jonathan" → "J******n"
email
"john.doe@example.com" → "j***@example.com"
phone
"555-123-4567" → "***-***-4567"
card
"4111-1111-1111-1234" → "****-****-****-1234"
ssn
"123-45-6789" → "***-**-6789"
Reduce data precision while maintaining analytical utility
year
"1985-06-15" → "1985"
month
"1985-06-15" → "1985-06"
quarter
"1985-06-15" → "1985Q2"
decade
"1985-06-15" → "1980s"
age_range
47 → "40-49"
zip3
"90210" → "902XX"
zip2
"90210" → "90XXX"
round
52,750 → 53,000
Control how pseudonyms relate across records
Manage pseudonymisation rules with familiar SQL syntax
-- Create a runtime transformation rule
CREATE PSEUDONYMISATION RULE ON healthcare.patients (ssn)
TRANSFORM keyed_hash
SCOPE person
PRIORITY 10;
-- Apply transformation to existing data permanently
APPLY PSEUDONYMISATION ON healthcare.patients (email)
TRANSFORM redact
PARAMS (mode = 'email')
WHERE status = 'inactive';
-- View active rules
SHOW PSEUDONYMISATION RULES FOR healthcare.patients;
-- Enable/disable rules dynamically
ALTER PSEUDONYMISATION RULE ON healthcare.patients (ssn)
SET DISABLED;
Vectorized operations with intelligent caching
~100-500ns per value. Two-level caching (global + scoped) ensures consistency and speed.
~1-2µs per value. Production-ready cryptographic hashing with minimal overhead.
~5-10µs per value. Nonce-misuse resistant authenticated encryption.
~50-100µs per value. Brute-force resistant hashing for highest security.
From development to production, secure key storage for every environment
Works with any supported data source
Delta Lake
Wrap delta-rs tables transparently
Parquet
Native columnar file support
CSV / JSON
Text-based file formats
PostgreSQL
Database connector support
SQL Server
Enterprise database integration
Custom Sources
Any supported data source
Built for European data protection requirements
Implements pseudonymisation as defined: "processing personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information."
Data protection by design and by default. Pseudonymisation applied automatically during query execution without manual intervention.
Security of processing through encryption and pseudonymisation. Cryptographic transforms for robust protection.
Aligned with European Data Protection Board Guidelines 01/2025. Scoped pseudonymisation for controlled linkability.
Data is pseudonymised in-memory during read. Your data lake only ever sees protected values
SELECT * FROM patients
┌─────────────────────────────────────────────────────────────────┐
│ SQL Query │
│ SELECT * FROM patients WHERE ... │
└────────────────────────────┬────────────────────────────────────┘
│
v
┌─────────────────────────────────────────────────────────────────┐
│ Pseudonymisation Layer │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Data Source (Delta, Parquet, CSV, Database, ...) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ CompiledRules│ │TransformReg. │ │ KeyStore │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ ValueCache (scope + global) │ │
│ └──────────────────────────────────────────────────────────┘ │
└────────────────────────────┬────────────────────────────────────┘
│
v
┌─────────────────────────────────────────────────────────────────┐
│ Pseudonymised Results │
│ (Schema unchanged, values transformed) │
└─────────────────────────────────────────────────────────────────┘
Industry-standard algorithms for production deployments
In-memory pseudonymisation on read. Click a column, choose an algorithm. Your data lake never sees unprotected values.