Skip to content
Compute Engine

Columnar execution on every core

Each query runs inside a native execution engine. Columnar batches flow through a vectorized operator pipeline using all available CPU cores. No cluster spin-up, no JVM warmup, no idle overhead.

Vectorized columnar operator pipeline
Parallel morsel scheduler across cores
One worker per query by default, distributed opt-in
Parquet Scan col pruning + partition skip batch Columnar Batch id price qty date Operator Pipeline Filter (predicate pushdown) v Hash Aggregation v Sort / Top-K v Result (Arrow format) cores core 0: partition[0..3] core 1: partition[4..7] core 2: partition[8..N]

How a query executes

From SQL text to result set, one coherent native pipeline

Parse and plan

SQL is parsed to an AST, type-checked, and optimized: predicate pushdown, partition pruning, projection pruning, join reordering based on collected statistics.

Morsel scheduling

The physical plan is broken into morsels (chunks of partitions). A task scheduler assigns morsels to worker threads, balancing load across cores throughout the query.

Vectorized operators

Each operator (scan, filter, hash join, hash aggregation, sort) processes a full columnar batch per call. Column-at-a-time evaluation keeps CPU caches hot and enables SIMD-friendly memory layouts.

Spill-to-disk

Sort and hash aggregation spill to disk when data exceeds available memory. Spill is compressed and async so other operators continue in parallel.

Query optimization

Delta Lake-aware cost-based optimizer with file-level statistics

Predicate pushdown

Filters on partition columns eliminate entire directory scans. File-level min/max statistics skip Parquet files that cannot contain matching rows.

Join reordering

The optimizer picks join order based on estimated cardinality from table statistics, reducing intermediate result size.

Semi/anti joins

EXISTS and NOT EXISTS subqueries are converted to semi-joins and anti-joins with early termination, avoiding unnecessary full-table evaluation.

Expression folding

Constant expressions are folded at plan time. Dead predicates (1=1, WHERE false) are eliminated before any data is read.

Distributed execution

Single-worker by default; explicit opt-in for workloads that need cross-worker parallelism

Default: single worker

Every query runs on one worker. That worker uses all its CPU cores via the morsel scheduler. Most analytical workloads fit comfortably inside a single large VM.

Opt-in distribution

For workloads where a single node is a genuine bottleneck, you can opt a query into distributed execution across multiple workers. Each worker processes its assigned partitions independently and results are gathered by a coordinator phase.

Use modern hardware efficiently

A native execution engine that keeps all your cores busy on every query.