Each query runs inside a native execution engine. Columnar batches flow through a vectorized operator pipeline using all available CPU cores. No cluster spin-up, no JVM warmup, no idle overhead.
From SQL text to result set, one coherent native pipeline
SQL is parsed to an AST, type-checked, and optimized: predicate pushdown, partition pruning, projection pruning, join reordering based on collected statistics.
The physical plan is broken into morsels (chunks of partitions). A task scheduler assigns morsels to worker threads, balancing load across cores throughout the query.
Each operator (scan, filter, hash join, hash aggregation, sort) processes a full columnar batch per call. Column-at-a-time evaluation keeps CPU caches hot and enables SIMD-friendly memory layouts.
Sort and hash aggregation spill to disk when data exceeds available memory. Spill is compressed and async so other operators continue in parallel.
Delta Lake-aware cost-based optimizer with file-level statistics
Filters on partition columns eliminate entire directory scans. File-level min/max statistics skip Parquet files that cannot contain matching rows.
The optimizer picks join order based on estimated cardinality from table statistics, reducing intermediate result size.
EXISTS and NOT EXISTS subqueries are converted to semi-joins and anti-joins with early termination, avoiding unnecessary full-table evaluation.
Constant expressions are folded at plan time. Dead predicates (1=1, WHERE false) are eliminated before any data is read.
Single-worker by default; explicit opt-in for workloads that need cross-worker parallelism
Every query runs on one worker. That worker uses all its CPU cores via the morsel scheduler. Most analytical workloads fit comfortably inside a single large VM.
For workloads where a single node is a genuine bottleneck, you can opt a query into distributed execution across multiple workers. Each worker processes its assigned partitions independently and results are gathered by a coordinator phase.
A native execution engine that keeps all your cores busy on every query.