How much faster is ADBC than ODBC in this benchmark?

On the canonical 1,000,000-row, 22-column Delta fact-table fixture, the DeltaForge ADBC driver measured 3.21x faster than ODBC bound-column reads (SQLBindCol plus SQLFetch), 6.27x faster than per-cell SQLGetData, and 13.68x faster than System.Data.Odbc.OdbcDataReader on .NET 8. ADBC drain throughput reached 2.39 M rows per second versus 743 k for ODBC bound and 160 k for .NET ODBC.

How do I reproduce these numbers?

Clone delta-forge-benchmarks, run scripts/install.sh for host packages, stage-local-bins.sh plus stage-driver-bins.sh for engine and driver binaries, setup-host-stack.sh to provision the DeltaForge stack with your license key, then run_smoke.sh for a sanity pass and run_bench.sh for the canonical run.

Why is the per-cell SQLGetData number relevant?

Per-cell SQLGetData is the consumption pattern that .NET OdbcDataReader, the Power Query mashup engine, and most ODBC client libraries actually emit. The bound-column path is the fastest ODBC shape on paper but real BI tools rarely use it.

Does the benchmark cover Windows or only Linux?

The canonical run is Linux x86_64 because the bench harness in this repo standardises on Linux for TPC-H, SSB, JOB, and TPC-DS. Windows .NET against the same drivers shows directionally identical ratios.

ADBC vs ODBC Performance Benchmark for Power BI

Q: What does the benchmark fixture look like?

1,000,000 rows across 22 mixed-type columns: BIGINT, SMALLINT, TINYINT, BOOL, DOUBLE, DECIMAL(18,4), DECIMAL(28,8), DECIMAL(10,2), TIMESTAMP, DATE, VARCHAR of multiple widths, and MD5 hexstrings. The same SELECT runs through every driver and mode against the same DeltaForge stack.

Headline numbers

Warm median of 3 measured iterations, 1 discarded warmup. Fixture: 1,000,000 rows x 22 mixed-type columns (BIGINT, SMALLINT, TINYINT, BOOL, DOUBLE, DECIMAL(18,4) / DECIMAL(28,8) / DECIMAL(10,2), TIMESTAMP, DATE, VARCHAR of multiple widths, MD5 hexstrings). Linux x86_64, self-provisioned DeltaForge stack on the same host.

Harness	Driver / mode	t_total	t_drain	rows/sec
C++ unixODBC + Arrow C Stream Interface	ODBC bound-column (`SQLBindCol` + `SQLFetch`)	1.345 s	1.295 s	743 k/s
	ODBC per-cell `SQLGetData` (.NET / Power BI pattern)	2.625 s	2.580 s	381 k/s
	ADBC (Arrow stream)	0.419 s	0.332 s	2.39 M/s
.NET 8 System.Data.Odbc + Apache.Arrow.Adbc	`OdbcDataReader.GetValues`	6.249 s	6.158 s	160 k/s
.NET 8 System.Data.Odbc + Apache.Arrow.Adbc	Apache.Arrow.Adbc	0.457 s	0.430 s	2.19 M/s

ADBC vs ODBC bound-column

3.21x on total wall time, 3.90x on drain. This is the fastest ODBC path on Linux (the consumer pre-binds every column). ADBC still wins because there is no columnar -> row transpose, only a refcount handoff.

ADBC vs ODBC per-cell SQLGetData

6.27x on total wall time, 7.77x on drain. SQLGetData is what .NET, Power BI, EF Core, and most ODBC libraries call under the hood; this is the apples-to-apples cost of how BI tools actually consume ODBC.

.NET ADBC vs .NET ODBC

13.68x on total wall time, 14.31x on drain. Real Apache.Arrow.Adbc on .NET 8 vs System.Data.Odbc.OdbcDataReader on the same managed runtime. This is the gap a Power BI report scan sees.

How it runs

The bench follows the same install / setup / run pattern as the TPC-H, SSB, JOB, and TPC-DS benches in the same repo. Two host commands stand up the stack; two more run the bench.

# 1. one-shot host setup (unixODBC, cmake, build-essential, .NET 8 SDK)
./scripts/install.sh

# 2. stage engine binaries + ODBC + ADBC drivers (from a DeltaForge release)
../scripts/stage-local-bins.sh
./scripts/stage-driver-bins.sh

# 3. provision a self-contained DeltaForge stack on this host
export DELTA_FORGE_LICENSE_KEY=dfk_...
./scripts/setup-host-stack.sh

# 4. run
./scripts/run_smoke.sh   # ~30s sanity
./scripts/run_bench.sh   # ~2-5 min canonical run
                

Same query, same control plane

Both drivers run the same SELECT * FROM t_wide against the same self-provisioned DeltaForge stack on the same host. Sequential, not concurrent. No tuning between modes.

Per-phase timing

Each iteration reports wall time in five phases: connect / execute / bind / drain / release. The drain phase is the diagnostic one: ADBC's drain is a refcount handoff, ODBC's drain is the per-cell copy work.

Warm median, errored iters excluded

Reported numbers are per-phase medians across the measured iterations, computed independently per phase so the warm-median row is not a single sample. Any iteration that errors is excluded from the median.

What the bench does and does not measure

Honest scope. The bench measures one workload shape and one consumption pattern per driver. Customer workloads vary.

Result-set shape sensitivity

The 22-column mixed-type fixture is one shape. Narrow integer-only results show a smaller gap; very wide decimal-heavy results show a larger one. The harness accepts --sql 'SELECT ... FROM your.table' so you can drop in a workload that matches your actual BI scan.

Linux only, today

The bench runs on Linux x86_64 because the canonical bench harness in this repo does. Windows .NET against the same drivers shows directionally identical ratios; the absolute throughput varies with the OS's TLS stack and driver-manager overhead.

One connection, one consumer

The bench drives one connection per driver, sequentially. Concurrency is not measured; the existing engine-level benches in this repo cover that for the server side.

ADBC vs ODBC performance,
measured on the same fixture

Headline numbers

ADBC vs ODBC bound-column

ADBC vs ODBC per-cell SQLGetData

.NET ADBC vs .NET ODBC

How it runs

Same query, same control plane

Per-phase timing

Warm median, errored iters excluded

What the bench does and does not measure

Result-set shape sensitivity

Linux only, today

One connection, one consumer

Further reading

Reproduce it on your own hardware

ADBC vs ODBC performance,measured on the same fixture

Headline numbers

ADBC vs ODBC bound-column

ADBC vs ODBC per-cell SQLGetData

.NET ADBC vs .NET ODBC

How it runs

Same query, same control plane

Per-phase timing

Warm median, errored iters excluded

What the bench does and does not measure

Result-set shape sensitivity

Linux only, today

One connection, one consumer

Further reading

Reproduce it on your own hardware

ADBC vs ODBC performance,
measured on the same fixture