200+ functions, full PL/pgSQL procedural language, advanced window functions, and production-grade query optimization. This isn't just SQL support - it's PostgreSQL-level capability on lakehouse architecture.
A ground-up implementation delivering enterprise SQL semantics
Comprehensive function library covering every analytical need
abs, ceil, floor, round, trunc
Basic arithmetic
power, sqrt, cbrt, exp, ln, log, log10
Exponential & logarithmic
sin, cos, tan, asin, acos, atan, atan2
Trigonometric
sinh, cosh, tanh, asinh, acosh, atanh
Hyperbolic
degrees, radians, pi
Angle conversion
gcd, lcm, factorial, sign
Number theory
random, setseed
Random generation
width_bucket, div, mod
Bucketing & division
length, char_length, octet_length, bit_length
Length operations
upper, lower, initcap
Case conversion
trim, ltrim, rtrim, btrim
Whitespace handling
substring, left, right, substr
Extraction
position, strpos, locate
Search
replace, translate, overlay
Replacement
concat, concat_ws, repeat, reverse
Concatenation
lpad, rpad, format
Formatting
split_part, string_to_array, array_to_string
Splitting & joining
regexp_match, regexp_replace, regexp_split_to_array
Regular expressions
ascii, chr, encode, decode
Character encoding
md5, sha256, sha512
Hashing
now, current_timestamp, current_date, current_time
Current values
date_trunc, date_part, extract
Component extraction
date_add, date_sub, date_diff
Arithmetic
to_timestamp, to_date, to_char
Conversion
year, month, day, hour, minute, second
Component access
dayofweek, dayofyear, weekofyear, quarter
Calendar functions
age, isfinite, justify_days, justify_hours
Interval operations
timezone, at time zone
Timezone handling
count, sum, avg, min, max
Basic aggregates
stddev, stddev_pop, stddev_samp
Standard deviation
variance, var_pop, var_samp
Variance
corr, covar_pop, covar_samp
Correlation
regr_slope, regr_intercept, regr_r2
Linear regression
percentile_cont, percentile_disc
Percentiles
mode, median
Statistical mode
array_agg, string_agg, json_agg
Collection aggregates
bool_and, bool_or, every
Boolean aggregates
bit_and, bit_or, bit_xor
Bitwise aggregates
Full SQL:2011 window function specification with frame support
ROW_NUMBER() - Sequential row numberingRANK() - Ranking with gapsDENSE_RANK() - Ranking without gapsNTILE(n) - Bucket distributionPERCENT_RANK() - Relative rankCUME_DIST() - Cumulative distributionFIRST_VALUE() - First in partitionLAST_VALUE() - Last in partitionNTH_VALUE(n) - Nth row valueLAG(n) - Previous row accessLEAD(n) - Next row accessROWS BETWEEN - Physical row framesRANGE BETWEEN - Logical value framesGROUPS BETWEEN - Peer group framesUNBOUNDED PRECEDINGCURRENT ROWUNBOUNDED FOLLOWINGEXCLUDE NO OTHERS - Include allEXCLUDE CURRENT ROWEXCLUDE GROUP - Exclude peersEXCLUDE TIES - Exclude peer tiesSELECT
customer_id,
order_date,
amount,
-- Running total with frame
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as running_total,
-- 7-day moving average
AVG(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW
) as moving_avg_7d,
-- Percent of customer total
amount / SUM(amount) OVER (PARTITION BY customer_id) * 100 as pct_of_total,
-- Compare to previous order
amount - LAG(amount, 1, 0) OVER (
PARTITION BY customer_id ORDER BY order_date
) as diff_from_prev
FROM orders;
Enterprise stored procedure support with complete control flow
%TYPE column type reference%ROWTYPE table row typeIF/THEN/ELSIF/ELSECASE WHEN expressionsLOOP/EXIT/CONTINUEWHILE loopsFOR loops (integer range)FOREACH array iterationFOR ... IN SELECT query loopsRETURN NEXT for set-returningEXCEPTION WHEN blocksRAISE EXCEPTIONRAISE NOTICE/WARNINGSQLSTATE error codesSQLERRM error messagesOPEN/FETCH/CLOSEFOUND attributeSCROLL cursorsREFCURSOR return typeCREATE OR REPLACE FUNCTION process_monthly_billing(
p_month DATE,
p_discount_threshold DECIMAL DEFAULT 1000.00
) RETURNS TABLE(
customer_id INT,
total_amount DECIMAL,
discount_applied DECIMAL,
final_amount DECIMAL
) AS $$
DECLARE
v_customer RECORD;
v_total DECIMAL;
v_discount DECIMAL;
v_processed INT := 0;
BEGIN
-- Loop through all active customers
FOR v_customer IN
SELECT id, tier FROM customers WHERE status = 'active'
LOOP
-- Calculate monthly total
SELECT COALESCE(SUM(amount), 0) INTO v_total
FROM orders
WHERE customer_id = v_customer.id
AND date_trunc('month', order_date) = date_trunc('month', p_month);
-- Apply tiered discount logic
v_discount := CASE
WHEN v_customer.tier = 'platinum' AND v_total > p_discount_threshold THEN v_total * 0.15
WHEN v_customer.tier = 'gold' AND v_total > p_discount_threshold THEN v_total * 0.10
WHEN v_total > p_discount_threshold * 2 THEN v_total * 0.05
ELSE 0
END;
-- Return row
customer_id := v_customer.id;
total_amount := v_total;
discount_applied := v_discount;
final_amount := v_total - v_discount;
RETURN NEXT;
v_processed := v_processed + 1;
END LOOP;
RAISE NOTICE 'Processed % customers for month %', v_processed, p_month;
EXCEPTION
WHEN OTHERS THEN
RAISE EXCEPTION 'Billing error: % - %', SQLSTATE, SQLERRM;
END;
$$ LANGUAGE plpgsql;
Cost-based optimizer with Delta Lake-aware pruning
Filters pushed to storage layer, leveraging Delta Lake file-level statistics to skip entire Parquet files that can't contain matching rows.
WHERE date > '2024-01-01' → Skip files with max_date < '2024-01-01'
Automatic detection of partition columns in WHERE clauses, eliminating entire partition directories from scan.
WHERE region = 'US' → Read only /region=US/ partition
Only requested columns are read from Parquet files. Column metadata used to skip unnecessary row groups.
SELECT name, email → Read only 2 of 50 columns
Statistics-based join order optimization. Smaller tables joined first, reducing intermediate result sizes.
A ⋈ B ⋈ C → Reorder to (A ⋈ C) ⋈ B if |C| < |B|
Constant folding, dead code elimination, and Boolean expression simplification at compile time.
WHERE 1=1 AND active → WHERE active
Correlated subqueries transformed to efficient joins. EXISTS/IN converted to semi-joins.
WHERE id IN (SELECT...) → Semi-join
Multiple join algorithms selected based on data characteristics
Default for equality joins. Build side chosen based on estimated cardinality. Supports spill-to-disk for large builds.
Optimal when inputs are pre-sorted or when sort can be reused. Handles range joins efficiently.
Used for cross joins and complex predicates that cannot use equality-based algorithms.
Optimized execution for EXISTS, NOT EXISTS, IN, and NOT IN patterns with early termination.
Comprehensive data types with full coercion rules
200+ functions. Full PL/pgSQL. Production-grade optimization.