Does the property graph require a separate graph database?

No. The graph is projected on demand from columns in existing Delta tables. There is no second store to sync, no ETL into a graph engine, and no second copy of the data.

Which graph query language is supported?

openCypher MATCH, WHERE, and RETURN for read queries. Cypher stays read-only. All writes go through SQL on the underlying Delta tables.

Which graph algorithms ship as SQL table functions?

Thirty-two functions across six families: centrality, community detection, topology, pathfinding, similarity, and embeddings. Eighteen of them have GPU implementations selectable via the ON GPU Cypher hint.

Do mutations on the Delta table flow through to the graph?

Yes. INSERT, UPDATE, and DELETE on the underlying Delta tables propagate to the projection.

How do GPU and CPU paths compare?

ON GPU is strict by default: declines or missing implementations raise a clear error rather than silently falling back to CPU. The eighteen GPU-enabled algorithms produce numerically identical results to their CPU counterparts, verified against a Neo4j+GDS reference on Zachary's karate club.

Graph Analytics on the Data Lake: Cypher on Delta Tables

Q: Do I need a graph database for community detection?

No. Community detection ships as SQL table functions: graph_louvain, graph_leiden, graph_labelpropagation and seven more run against the projected graph, and their output joins back to any Delta table in the same SELECT.

How the projection works

Your table rows become graph edges; referenced keys become nodes

Declare once

Tell DeltaForge which column is the source node and which is the target node. The graph is projected from those columns.

Session-local projection

The graph is built on demand and held in the session, ready for Cypher queries and algorithm table functions without a round-trip to storage.

Mutations flow through

Use ordinary INSERT, UPDATE, DELETE on the underlying Delta tables. The graph rebuilds incrementally so Cypher sees the latest data.

Cypher stays read-only

MATCH, WHERE, RETURN for graph queries. All writes go through SQL on the Delta table, keeping a single transaction model.

32 graph algorithms as SQL table functions

Every algorithm runs inside the same SQL engine that reads your tables, not in a separately deployed query layer. Call one from a SELECT, join its output to any table, and feed it straight into a MERGE or a chart in one statement. 18 ship with GPU implementations selectable via the ON GPU Cypher hint.

Centrality (8)

graph_pagerank(), graph_articlerank(), graph_eigenvector(), graph_hits(), graph_betweenness(), graph_closeness(), graph_harmonic(), graph_degree()

Community Detection (10)

graph_louvain(), graph_leiden(), graph_labelpropagation(), graph_components(), graph_scc(), graph_kcore(), graph_lcc(), graph_triangle_count(), graph_modularity(), graph_conductance()

Topology & Connectivity (2)

graph_bridges(), graph_articulationpoints()

Pathfinding (9)

graph_shortest_path() (Dijkstra), graph_bellmanford(), graph_deltastepping(), graph_astar(), graph_yens(), graph_bfs(), graph_dfs(), graph_mst(), graph_randomwalk()

Similarity (2)

graph_knn(), graph_similarity() (Jaccard, Adamic-Adar, common neighbors)

Node Embeddings (1)

graph_fastrp() (Fast Random Projection embeddings)

GPU acceleration via the `ON GPU` Cypher hint

Eighteen algorithms ship with WGSL compute shaders that run on any cross-vendor wgpu device (NVIDIA, AMD, Intel, Apple). Numerically identical to the CPU path, verified against Neo4j + GDS 2.6.9 on Zachary's karate club.

GPU-accelerated (18)

PageRank, ArticleRank, Eigenvector, HITS, Betweenness, Harmonic, Louvain, Label Propagation, Connected Components, K-Core, LCC, Triangle Count, Modularity, Conductance, Bellman-Ford, Delta-Stepping, Random Walk, FastRP.

CPU-only by design (14)

Leiden refinement, Bridges, Articulation Points, A*, Yen's K-shortest paths, Closeness, SCC, Shortest Path (Dijkstra), BFS, DFS, MST, KNN, Similarity, Degree. Sequential by construction; GPU offers no speedup.

Strict dispatch

ON GPU means "run on GPU or error". No silent CPU fallback. Below-threshold graphs override with ON GPU THRESHOLD 1; a missing GPU implementation errors with a clear message instead of producing inconsistent results.

Storage modes

Nodes and edges live in regular Delta tables; choose the column layout that fits

Flattened

Structured columns for fixed-schema graphs. Best for analytics-heavy scans.

Hybrid

Structured edge columns with a property map for flexible attributes. Balances scan speed with schema flexibility.

JSON

Full graph structure in JSON columns for deeply nested or heterogeneous schemas.

Frequently asked questions

Short answers to the questions teams ask before running graph workloads on the lake

Can I run Cypher on Parquet files?

Yes. The graph is projected from table columns, so openCypher MATCH queries run over Parquet and Delta tables in place. Nothing is exported to Neo4j or any separate graph store.

Do I need a graph database for community detection?

No. Community detection ships as SQL table functions: graph_louvain(), graph_leiden(), graph_labelpropagation() and seven more run against the projected graph, and their output joins back to any Delta table in the same SELECT.

Can algorithm results join back to ordinary SQL tables?

Yes. Each algorithm is a SQL table function, so its output rows join with any Delta table in the same SELECT, ready for a MERGE or a dashboard query.

Your Delta tables are a property graph

How the projection works

Declare once

Session-local projection

Mutations flow through

Cypher stays read-only

32 graph algorithms as SQL table functions

Centrality (8)

Community Detection (10)

Topology & Connectivity (2)

Pathfinding (9)

Similarity (2)

Node Embeddings (1)

GPU acceleration via the `ON GPU` Cypher hint

GPU-accelerated (18)

CPU-only by design (14)

Strict dispatch

Storage modes

Flattened

Hybrid

JSON

Frequently asked questions

Can I run Cypher on Parquet files?

Do I need a graph database for community detection?

Can algorithm results join back to ordinary SQL tables?

Further reading

Run Cypher on Parquet and Delta Tables Without Neo4j

Community Detection in SQL: Louvain on Delta Lake Tables

MCP Server: a knowledge graph for agents

Query the connections already in your data

Your Delta tables are a property graph

How the projection works

Declare once

Session-local projection

Mutations flow through

Cypher stays read-only

32 graph algorithms as SQL table functions

Centrality (8)

Community Detection (10)

Topology & Connectivity (2)

Pathfinding (9)

Similarity (2)

Node Embeddings (1)

GPU acceleration via the ON GPU Cypher hint

GPU-accelerated (18)

CPU-only by design (14)

Strict dispatch

Storage modes

Flattened

Hybrid

JSON

Frequently asked questions

Can I run Cypher on Parquet files?

Do I need a graph database for community detection?

Can algorithm results join back to ordinary SQL tables?

Further reading

Run Cypher on Parquet and Delta Tables Without Neo4j

Community Detection in SQL: Louvain on Delta Lake Tables

MCP Server: a knowledge graph for agents

Query the connections already in your data

GPU acceleration via the `ON GPU` Cypher hint