A lightweight, self-contained data platform that runs wherever your data lives. No Spark clusters. No Hadoop dependencies. Just three components that fit into any infrastructure.
Delta Forge keeps it simple — a control plane for governance, compute workers for queries, and your own storage for data.
The brain of the platform. Manages metadata, security and credentials so your data stays governed.
Stateless query engines that scale horizontally. Add more workers for more concurrency.
Data stays in Delta Lake format on storage you already own and control. No lock-in.
See how Delta Forge fits into real infrastructure
Run Delta Forge entirely within your network perimeter. Data never leaves your infrastructure. Deploy on bare-metal servers, VMware, or an on-premise Kubernetes cluster.
Two container images: control plane and worker. Orchestrate with Helm charts or a simple compose file.
Kubernetes HPA scales workers based on CPU and memory. KEDA support for scale-to-zero when idle.
Use MinIO, Ceph, or any NAS mount. Delta Lake tables are standard Parquet — no proprietary formats.
Bootstraps its own PostgreSQL catalog on first run. No manual database provisioning or configuration required.
Deploy on Azure Kubernetes Service with native integration into ADLS Gen2, Azure Key Vault, and Entra ID. Scale compute independently from storage.
Store Delta Lake tables on ADLS Gen2 with hierarchical namespace. Native abfss:// protocol support.
Workers auto-scale based on CPU/memory demand. KEDA integration enables scale-to-zero for cost savings.
Storage credentials and secrets managed via Key Vault. Rotate keys without redeploying workers.
Authenticate users via Managed Identity or service principals. Map Azure AD groups to Delta Forge roles.
Delta Forge extends PostgreSQL with powerful new commands, and provides purpose-built interfaces to use them
The Delta Forge GUI, VS Code extension, and CLI — built for extended SQL including PIPELINE, VACUUM, OPTIMIZE, and time travel.
Power BI, Tableau, Looker, Metabase connect via PostgreSQL or Arrow Flight SQL.
Federate queries to PostgreSQL, MySQL, SQL Server, plus CSV, JSON, Parquet, Excel.
Native parsing of HL7 (healthcare), FHIR, and EDI (X12, EDIFACT, TRADACOMS).
Governance and compliance are not add-ons — they are part of every query
Granular RBAC with role inheritance, row-level security filters, and column-level masking. Enforce who sees what at the query engine level.
Built-in pseudonymisation engine with keyed hashing, AES encryption, and redaction transforms. Deterministic output supports joins across datasets.
Every query, credential access, and permission change is logged. Full audit trail for SOC 2, HIPAA, and GDPR compliance requirements.
What sets Delta Forge apart from legacy data platforms
Purpose-built on Apache Arrow. A single worker binary replaces an entire Spark cluster — dramatically reducing infrastructure cost and operational complexity.
All data is stored as standard Delta Lake tables (Parquet files + JSON transaction log). Compatible with Databricks, Apache Spark, Trino, and any tool that reads Delta Lake.
Workers are accessed through the Delta Forge Desktop GUI, VS Code extension, or CLI. Purpose-built tools designed for lakehouse workflows.
The same engine runs as a desktop application for development and as a distributed cluster for production. One platform from prototyping to scale.
Control plane needs just 512 MB RAM with an embedded SQLite catalog. No external PostgreSQL, Redis, or ZooKeeper. Workers start in seconds and scale to zero when idle.
Compute workers are stateless — scale them independently from storage. Pay for compute only when queries are running. Storage costs stay predictable.
Get a guided deployment in your private data centre or Azure subscription.