Skip to content
SQL Engine

Broad SQL for Delta Lake tables

A ground-up SQL implementation with window functions, PL/pgSQL-style stored procedures, a rich function library, and a cost-based optimizer that understands Delta Lake file statistics.

Window functions with full frame and exclusion support
PL/pgSQL-style stored procedures and cursors
Property graph queries with Cypher
SELECT customer_id, SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date ROWS UNBOUNDED PRECEDING) AS running_total Parse AST + type check Optimize CBO + pruning Execute vectorized pipeline DML / DDL INSERT UPDATE MERGE DELETE Window Fns RANK LAG LEAD ROWS/RANGE frames Procedures PL/pgSQL FOR loops cursors Cypher property graph MATCH RETURN Delta Lake Tables ACID transactions, time travel, schema evolution

Function library

Common analytical functions across numeric, string, temporal, and aggregate categories

Math and numeric

Arithmetic, exponential, logarithmic, trigonometric, and hyperbolic functions. Decimal arithmetic with exact precision. Rounding, bucketing, and number-theory helpers.

String

Substring, pattern matching, regex, case conversion, padding, concatenation, encoding, hashing (md5, sha256, sha512), and array-to-string operations.

Date and time

Timestamp arithmetic, date truncation, component extraction, timezone conversion, interval operations, and calendar functions (day-of-week, quarter, week-of-year).

Aggregates

Standard aggregates plus statistical functions: variance, standard deviation, correlation, covariance, linear regression coefficients, percentiles, and mode.

Window functions

Full frame specification including ROWS, RANGE, and GROUPS modes with exclusion clauses

Ranking

ROW_NUMBER, RANK, DENSE_RANK, NTILE, PERCENT_RANK, CUME_DIST. Partition and order clauses supported on all ranking functions.

Value access

FIRST_VALUE, LAST_VALUE, NTH_VALUE, LAG (previous row), and LEAD (next row) with default-value support for boundary rows.

Frame modes

ROWS BETWEEN (physical row offsets), RANGE BETWEEN (logical value ranges, including interval-based ranges for time series), and GROUPS BETWEEN (peer group frames).

Frame exclusion

EXCLUDE NO OTHERS, EXCLUDE CURRENT ROW, EXCLUDE GROUP, and EXCLUDE TIES for precise control over which rows count in each frame.

Window query example
SELECT
    customer_id,
    order_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total,
    AVG(amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
        RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW
    ) AS moving_avg_7d
FROM orders;

PL/pgSQL-style stored procedures

Procedural logic runs inside the engine, alongside your SQL, with full transaction semantics

Variables and types

Scalar variables, record types, %TYPE column references, %ROWTYPE table row types, array variables, and CONSTANT declarations with default values.

Control flow

IF/THEN/ELSIF/ELSE, CASE WHEN, LOOP/EXIT/CONTINUE, WHILE, FOR over integer ranges and query result sets, and FOREACH over arrays.

Exception handling

EXCEPTION WHEN blocks with predefined exception types, RAISE EXCEPTION/NOTICE/WARNING, SQLSTATE error codes, and SQLERRM error messages.

Cursors

Implicit cursor FOR loops, explicit cursor declaration, parameterized cursors, OPEN/FETCH/CLOSE lifecycle, and REFCURSOR return type for set-returning functions.

PL/pgSQL stored procedure example
CREATE OR REPLACE FUNCTION process_monthly_billing(
    p_month DATE,
    p_discount_threshold DECIMAL DEFAULT 1000.00
) RETURNS TABLE(
    customer_id INT,
    total_amount DECIMAL,
    discount_applied DECIMAL,
    final_amount DECIMAL
) AS $$
DECLARE
    v_customer RECORD;
    v_total DECIMAL;
    v_discount DECIMAL;
    v_processed INT := 0;
BEGIN
    FOR v_customer IN
        SELECT id, tier FROM customers WHERE status = 'active'
    LOOP
        SELECT COALESCE(SUM(amount), 0) INTO v_total
        FROM orders
        WHERE customer_id = v_customer.id
          AND date_trunc('month', order_date) = date_trunc('month', p_month);

        v_discount := CASE
            WHEN v_customer.tier = 'platinum' AND v_total > p_discount_threshold THEN v_total * 0.15
            WHEN v_customer.tier = 'gold'     AND v_total > p_discount_threshold THEN v_total * 0.10
            WHEN v_total > p_discount_threshold * 2 THEN v_total * 0.05
            ELSE 0
        END;

        customer_id      := v_customer.id;
        total_amount     := v_total;
        discount_applied := v_discount;
        final_amount     := v_total - v_discount;
        RETURN NEXT;
        v_processed := v_processed + 1;
    END LOOP;

    RAISE NOTICE 'Processed % customers for month %', v_processed, p_month;
EXCEPTION
    WHEN OTHERS THEN
        RAISE EXCEPTION 'Billing error: % - %', SQLSTATE, SQLERRM;
END;
$$ LANGUAGE plpgsql;

Broad SQL on your Delta Lake tables

Window functions, stored procedures, Cypher graph queries, and a cost-based optimizer that reads Delta statistics.