Skip to content
Benchmark

ADBC vs ODBC,
measured on the same fixture

The driver bench at delta-forge-benchmarks/driver-bench drives the same SQL through the DeltaForge ODBC driver and the DeltaForge ADBC driver against the same self-provisioned DeltaForge stack and reports per-phase wall time. Public, scripted, reproducible in one command.

Headline numbers

Warm median of 3 measured iterations, 1 discarded warmup. Fixture: 1,000,000 rows x 22 mixed-type columns (BIGINT, SMALLINT, TINYINT, BOOL, DOUBLE, DECIMAL(18,4) / DECIMAL(28,8) / DECIMAL(10,2), TIMESTAMP, DATE, VARCHAR of multiple widths, MD5 hexstrings). Linux x86_64, self-provisioned DeltaForge stack on the same host.

Harness Driver / mode t_total t_drain rows/sec
C++
unixODBC + Arrow C Stream Interface
ODBC bound-column (SQLBindCol + SQLFetch) 1.345 s 1.295 s 743 k/s
ODBC per-cell SQLGetData (.NET / Power BI pattern) 2.625 s 2.580 s 381 k/s
ADBC (Arrow stream) 0.419 s 0.332 s 2.39 M/s
.NET 8
System.Data.Odbc + Apache.Arrow.Adbc
OdbcDataReader.GetValues 6.249 s 6.158 s 160 k/s
Apache.Arrow.Adbc 0.457 s 0.430 s 2.19 M/s

ADBC vs ODBC bound-column

3.21x on total wall time, 3.90x on drain. This is the fastest ODBC path on Linux (the consumer pre-binds every column). ADBC still wins because there is no columnar -> row transpose, only a refcount handoff.

ADBC vs ODBC per-cell SQLGetData

6.27x on total wall time, 7.77x on drain. SQLGetData is what .NET, Power BI, EF Core, and most ODBC libraries call under the hood; this is the apples-to-apples cost of how BI tools actually consume ODBC.

.NET ADBC vs .NET ODBC

13.68x on total wall time, 14.31x on drain. Real Apache.Arrow.Adbc on .NET 8 vs System.Data.Odbc.OdbcDataReader on the same managed runtime. This is the gap a Power BI report scan sees.

How it runs

The bench follows the same install / setup / run pattern as the TPC-H, SSB, JOB, and TPC-DS benches in the same repo. Two host commands stand up the stack; two more run the bench.

# 1. one-shot host setup (unixODBC, cmake, build-essential, .NET 8 SDK) ./scripts/install.sh # 2. stage engine binaries + ODBC + ADBC drivers (from a DeltaForge release) ../scripts/stage-local-bins.sh ./scripts/stage-driver-bins.sh # 3. provision a self-contained DeltaForge stack on this host export DELTA_FORGE_LICENSE_KEY=dfk_... ./scripts/setup-host-stack.sh # 4. run ./scripts/run_smoke.sh # ~30s sanity ./scripts/run_bench.sh # ~2-5 min canonical run

Same query, same control plane

Both drivers run the same SELECT * FROM t_wide against the same self-provisioned DeltaForge stack on the same host. Sequential, not concurrent. No tuning between modes.

Per-phase timing

Each iteration reports wall time in five phases: connect / execute / bind / drain / release. The drain phase is the diagnostic one: ADBC's drain is a refcount handoff, ODBC's drain is the per-cell copy work.

Warm median, errored iters excluded

Reported numbers are per-phase medians across the measured iterations, computed independently per phase so the warm-median row is not a single sample. Any iteration that errors is excluded from the median.

What the bench does and does not measure

Honest scope. The bench measures one workload shape and one consumption pattern per driver. Customer workloads vary.

Result-set shape sensitivity

The 22-column mixed-type fixture is one shape. Narrow integer-only results show a smaller gap; very wide decimal-heavy results show a larger one. The harness accepts --sql 'SELECT ... FROM your.table' so you can drop in a workload that matches your actual BI scan.

Linux only, today

The bench runs on Linux x86_64 because the canonical bench harness in this repo does. Windows .NET against the same drivers shows directionally identical ratios; the absolute throughput varies with the OS's TLS stack and driver-manager overhead.

One connection, one consumer

The bench drives one connection per driver, sequentially. Concurrency is not measured; the existing engine-level benches in this repo cover that for the server side.

Reproduce it on your own hardware

Clone the bench repo. Run four scripts. Read the numbers. Edit the SQL to match your real workload and re-run.