Skip to content
Git-First Pipelines

Git-Native Pipelines

Write SQL, commit to Git, and Delta Forge automatically discovers your pipelines, extracts lineage, computes execution order, and schedules runs. No DAGs. No YAML. No orchestration code.

Git-First by Design

Your pipeline code lives in Git. Delta Forge does the rest.

  • Pipelines cannot exist without Git — source control is mandatory, not optional
  • Every pipeline version, change, and decision is tracked in commit history
  • Standard developer workflow: branch, develop, commit, PR, merge, auto-deploy
  • No separate DAG definition, no YAML manifests, no Python orchestration code

How It Works

  1. Connect — Link a Git repository (GitHub, Azure DevOps, GitLab, Bitbucket)
  2. Write SQL — Create .sql files with PIPELINE and SCHEDULE declarations
  3. Commit & Push — Standard Git workflow with branches and pull requests
  4. Auto-Discovery — Delta Forge scans your repo, extracts pipelines, schedules, and lineage
  5. Execute — Pipelines run automatically on schedule with computed execution order

Declarative Pipeline SQL

Schedules, pipelines, and queries — all expressed in pure SQL

SQL pipelines/daily_etl.sql
-- Schedule: when and how to run
SCHEDULE daily_etl
  CRON '0 6 * * *'
  TIMEZONE 'America/New_York'
  TARGET_NODES ALL
  DESCRIPTION 'Daily morning ETL batch'
  RETRIES 0
  RETRY_DELAY 60
  TIMEOUT 3600
  MAX_CONCURRENT 1
  PRIORITY 10
  CATCHUP false
  NOTIFY 'team@example.com'
  WEBHOOK 'https://hooks.slack.com/...'
  ACTIVE
;

-- Pipeline: what to run
PIPELINE my_etl_pipeline
  DESCRIPTION 'Daily ETL pipeline for customer data'
  SCHEDULE 'daily_etl'
  TAGS 'etl', 'customers', 'daily'
  SLA 4.0
  FAIL_FAST true
  DEFAULTS ($run_date = '2024-01-01')
  STATUS ACTIVE
;

-- The actual SQL that runs
CREATE DELTA TABLE IF NOT EXISTS bronze.sales
LOCATION '/lake/bronze/sales'
AS SELECT *
FROM raw.csv.sales
WHERE sale_date = $run_date
ORDER BY id;

SCHEDULE defines when

Cron expressions, timezone, retries, timeout, concurrency limits, email notifications, Slack webhooks, and priority — all in one declaration.

PIPELINE defines what

Description, schedule reference, SLA targets, tags, fail-fast mode, and parameterized defaults. The pipeline points to its schedule by name.

Pure SQL follows

No special syntax after the declarations. Write the SQL you already know — CREATE TABLE, INSERT, MERGE, any standard SQL statement.

Parameterized runs

Variables like $run_date are declared with defaults in the PIPELINE block and can be overridden at execution time.

Automatic Execution Order

Delta Forge computes the DAG from your SQL — you never define it

SQL Parsing

Parses every SQL statement to find which tables each pipeline reads and which tables it writes. No annotations needed.

Dependency Graph

Builds a directed acyclic graph automatically from the read/write relationships across all pipelines in the workspace.

Topological Sort

Uses topological sort to compute execution layers. Pipelines in the same layer have no mutual dependencies and run concurrently.

Cycle Detection

Detects circular dependencies between pipelines and surfaces warnings before execution, preventing infinite loops or deadlocks.

Example: Three pipelines, automatically ordered

Layer 0
Pipeline A
writes bronze.sales
Layer 1
Pipeline B
reads bronze.sales, writes silver.sales
Layer 2
Pipeline C
reads silver.sales

Why Git-First Matters

A fundamentally different approach to data pipeline orchestration

vs DAG-based tools

No Python DAG code to maintain. The execution graph is computed automatically from the SQL itself — read/write analysis replaces manual dependency wiring.

vs YAML orchestrators

No manifest files, no configuration drift. The pipeline IS the SQL file. One artifact, one source of truth, zero synchronization overhead.

vs UI-defined pipelines

Code review, versioning, branching, and rollback are built in via Git. No screenshots of drag-and-drop canvases in pull requests.

vs Cloud-managed workflows

No vendor lock-in. Your pipelines are portable SQL files that live in your Git repository. Move between platforms without rewriting orchestration logic.

Pipeline Capabilities

Enterprise-grade features, declared in SQL

Approval Workflows

Add --APPROVAL REQUIRED to enforce review gates before production execution. Approval clears on source change.

SLA Monitoring

Declare SLA targets in hours. Delta Forge tracks execution time and alerts when pipelines exceed their SLA window.

Slack & Webhook Notifications

Built-in NOTIFY and WEBHOOK fields on schedules. Get alerts on success, failure, or SLA breach without external tooling.

Parameterized Defaults

DEFAULTS block declares variables with fallback values. Override at runtime for backfills or ad-hoc executions.

INCLUDE SCRIPT

Shared SQL modules that multiple pipelines can include. Write common logic once, reference it everywhere.

Pipeline Lifecycle

STATUS field supports DRAFT and ACTIVE. Develop and test pipelines in draft mode before enabling scheduled execution.

Fail-Fast Mode

FAIL_FAST true stops the pipeline on the first statement error. Set to false to continue executing remaining statements.

Concurrent Execution Limits

MAX_CONCURRENT controls how many instances of a schedule can run at once. Prevent overlapping runs and resource contention.

From Code to Production

A complete development lifecycle for SQL pipelines — built around the tools your team already knows

DEVELOP VERSION CONTROL PRODUCTION 1 Write SQL PIPELINE my_etl SCHEDULE 'daily_etl' STATUS ACTIVE; Desktop GUI or VS Code 2 Test & Run Highlight → Run 3 rows returned (42ms) Results inline below SQL SSMS-style execution 3 Git Commit HEAD pipelines/daily_etl.sql One repo per workspace 4 Push Push to remote 5 Approve Approval gate Review changes before production execution Clears on source change 6 Schedule & Execute cron: 0 6 * * * Runs on compute workers Git SHA recorded for audit 7 Monitor Execution history Statement-level results Data lineage tracking Iterate & improve Delta Lake Storage Pipeline results written as Delta tables — time travel, schema evolution, ACID transactions
1

Develop

Write SQL pipelines in a full-featured editor with IntelliSense, catalog browsing, and SSMS-style inline execution.

  • Open the Pipeline Designer in Desktop GUI or VS Code
  • Write sequential SQL statements separated by semicolons
  • Highlight any statement and press Run for instant results
  • Browse the data catalog sidebar for schema reference
  • Preview data lineage across statements
2

Version Control

Every pipeline is a SQL file stored in a Git repository. Each workspace has its own repo with full branching support.

  • Pipelines saved as pipelines/<name>.sql
  • Commit, push, pull, and branch from the toolbar
  • View diffs and switch branches without leaving the editor
  • Create pull requests for code review
  • Full Git history for audit and rollback
3

Production

Promote pipelines through approval gates and schedule them for automatic execution on compute workers.

  • Approval gates ensure reviewed code runs in production
  • Cron-based scheduling with timezone support
  • Pipelines execute on selected compute workers
  • Git commit SHA recorded with each execution
  • Approval clears automatically when source changes

Workspace-Centric Development

Workspaces organise pipelines, permissions, and source control into a single governed unit

Workspace: patient-analytics Visibility: Team • 3 members • 2 pipelines Git Repository main feature/new-transform Each workspace = one Git repository Pipelines ingest_patient_data.sql transform_analytics.sql Members & RBAC Alice (Owner) • Bob (Editor) • Eve (Viewer) Private / Team / Public visibility RBAC enforced on all pipeline operations Scheduling cron: 0 6 * * MON-FRI Compute node: worker-pool-1 Approval required: Yes

One Workspace, Everything Connected

A workspace is the unit of collaboration. It groups pipelines with their Git repository, team permissions, and execution schedules.

One Git repository per workspace

All pipelines in a workspace share a single Git repo. Branch, merge, and review changes as a team.

Team-based access control

Owners, editors, and viewers. RBAC governs who can edit pipelines, approve for production, or view results.

Pipeline-level scheduling

Each pipeline has its own cron schedule, compute node assignment, and approval gate for production.

Approval gates

Mark pipelines as requiring approval before scheduled execution. Approval clears automatically when source code changes.

Full Git Integration, Built In

All Git operations are available directly from the pipeline editor toolbar — no terminal needed

Commit & Push

Save your pipeline changes to the catalog, then commit and push to the remote Git repository — all from the editor toolbar dropdown.

Branching

Create feature branches, switch between branches, and merge changes. The toolbar shows your current branch and sync status at a glance.

Pull & Sync

Pull the latest changes from remote before starting work. Branch status indicators show clean (green), modified (yellow), or conflict (red).

Diff & History

View diffs of your changes before committing. Full commit history is available for auditing and understanding how pipelines evolved over time.

Pull Requests

Create pull requests directly from the editor for team code review. Ensure pipeline changes are reviewed before they reach production.

Audit Trail

Every scheduled execution records the Git commit SHA. Trace any production result back to the exact version of SQL that produced it.

Automatic Data Lineage

Understand how data flows through your pipelines — built in, not bolted on

Table-Level Tracking

See which tables feed which downstream tables. Lineage maps the full data flow across your pipeline from source tables to final outputs.

Fully Automatic

Lineage is derived from your pipeline SQL — zero configuration, zero manual annotation. Write your SQL and the lineage graph appears.

Indigenous Feature

Not an add-on or third-party integration. Data lineage is built into the Delta Forge platform from day one, available on every plan.

Flow Visualization

Visualize upstream and downstream dependency graphs directly in the editor. Understand at a glance how data moves through your pipeline.

Pipeline-Aware

Tracks data flow across multi-statement SQL pipelines. Temporary tables, CTEs, and intermediate results are followed across statement boundaries.

Impact Analysis

Understand what breaks when a source table changes. Trace downstream dependencies to assess the blast radius of schema changes before they happen.

See the pipeline workflow in action

Git-native pipelines, automatic execution order, and zero orchestration code.