Search for "hl7 to sql" and the results have barely moved in a decade: interface-engine documentation, forum threads from 2009 recommending a hand-rolled parser, and library tutorials that end with "now write the database insert yourself." The implied workflow is always the same. Stand up Mirth Connect or Iguana, build a channel, write a transformer per message type, design a staging schema, and maintain all of it. That is the right shape for moving live messages between clinical systems. It is a lot of machinery when the actual goal is to answer questions about messages you already have on disk.
This article shows the other path: pointing a SQL engine directly at the files. Every statement below comes from two demos in the DeltaForge demo library, hl7-patient-admin and hl7-lab-orders-results, which ship with real-world ADT, ORM, and ORU messages from EPIC, MegaReg, Folio3, Ritten, and other systems, spanning HL7 v2.3 through v2.6. You can run both demos yourself and reproduce every result.
How do you convert HL7 messages to SQL tables?
Start with what an HL7v2 message actually looks like. This is the opening of a real ADT^A01 admission from the demo data, sent by an EPIC system:
MSH|^~\&|EPIC|HOSPITAL|LAB|HOSPITAL|20230315140000||ADT^A01^ADT_A01|MSG00001|P|2.5.1|||AL|NE
EVN|A01|20230315140000|||JDOE^DOE^JOHN^M^DR^^^STAFF
PID|1||123456789^^^HOSP^MR~987-65-4321^^^SSN^SS||SMITH^JOHN^DAVID^JR^MR||19800515|M||2106-3^White^HL70005|123 MAIN ST^^CHICAGO^IL^60601^USA^HOME||...
Pipes delimit fields, carets delimit components, tildes delimit repetitions, and the segment grammar varies by version and by sending system. This is exactly the kind of format that pushes people toward middleware. The DeltaForge alternative is one DDL statement:
CREATE ZONE IF NOT EXISTS external
TYPE EXTERNAL
COMMENT 'External tables for file-backed data';
CREATE SCHEMA IF NOT EXISTS external.hl7_demos
COMMENT 'HL7 v2 message-backed external tables';
CREATE EXTERNAL TABLE IF NOT EXISTS external.hl7_demos.adt_messages
USING HL7
LOCATION 's3://acme-lake/demo/hl7-patient-admin/*.hl7'
OPTIONS (
file_metadata = '{"columns":["df_file_name","df_row_number"]}'
);
With no further configuration, each message becomes one row with three kinds of columns: the MSH header fields (msh_1 through msh_21) as plain text columns, df_message_json holding the complete parsed message as JSON, and df_message_id, a unique hash per message. The header columns alone already answer real questions:
SELECT
df_file_name,
msh_3 AS sending_app,
msh_4 AS sending_facility,
msh_9 AS message_type,
msh_12 AS hl7_version,
msh_10 AS message_control_id
FROM external.hl7_demos.adt_messages
ORDER BY df_file_name;
Against the demo data this returns eight rows: six A01 admissions, one A08 demographics update, and one A03 discharge, with sending_app values like EPIC and MegaReg and hl7_version ranging from 2.3 to 2.6. That last detail matters: five different HL7 versions coexist in one table because the version identifier is just another column (MSH-12), not a parser mode you have to configure per feed.
What happens at the storage layer
An external table copies nothing and converts nothing. The catalog stores the location glob and the parsing options; the .hl7 files stay where they are, on local disk or in an object store. At query time the engine lists the files matching the glob, parses each message, decodes HL7 escape sequences (\T\, \S\, \E\) into their literal characters, and streams the result to the query as Arrow record batches. Drop a new file into the directory and the next query sees it; there is no refresh step and no load job.
Two honest consequences follow from that design. First, every query re-reads and re-parses the files, which is ideal for a message archive that keeps growing but wasteful for a heavy dashboard hammering the same data; in that case run the external table query once and materialize the result into a Delta table, then point the dashboard at the Delta table. Second, fields keep their HL7 encoding: pid_5 comes back as SMITH^JOHN^DAVID^JR^MR with the component separators intact, and splitting that into last and first name is ordinary SQL string work, applied where you want it rather than baked into an ingestion pipeline you cannot revisit.
How do you turn ADT messages into patient tables?
Header columns describe messages. Patient analytics needs the fields inside PID (patient identification) and PV1 (patient visit). The hl7_config option's materialized_paths setting promotes any segment field to a first-class column, named by segment and field number:
CREATE EXTERNAL TABLE IF NOT EXISTS external.hl7_demos.adt_materialized
USING HL7
LOCATION 's3://acme-lake/demo/hl7-patient-admin/*.hl7'
OPTIONS (
hl7_config = '{
"materialized_paths": [
"pid_3", "pid_5", "pid_7", "pid_8", "pid_11",
"pv1_2", "pv1_3", "pv1_7",
"evn_1", "evn_2"
]
}',
file_metadata = '{"columns":["df_file_name","df_row_number"]}'
);
Now the ADT feed reads like a patient table:
SELECT
df_file_name,
pid_5 AS patient_name,
pid_7 AS date_of_birth,
pid_8 AS gender,
pid_11 AS address,
pv1_2 AS patient_class
FROM external.hl7_demos.adt_materialized
ORDER BY pid_5;
Because the version identifier is a column, confirming that one table really is absorbing five protocol versions is a one-line aggregate:
SELECT
msh_12 AS hl7_version,
COUNT(*) AS message_count
FROM external.hl7_demos.adt_messages
GROUP BY msh_12
ORDER BY msh_12;
Materialized paths cover the fields you query constantly. Everything else, including next-of-kin (NK1), allergies (AL1), diagnoses (DG1), guarantor (GT1), insurance (IN1), and even site-specific Z-segments, is preserved in df_message_json, where every segment appears with its fields as arrays. Nothing is dropped just because you did not materialize it:
SELECT
df_file_name,
msh_9 AS message_type,
df_message_json
FROM external.hl7_demos.adt_messages
ORDER BY df_file_name
LIMIT 3;
One behavior worth knowing before you rely on it: when a segment repeats within a message, the default mode materializes the first occurrence into the column, and all occurrences remain in the JSON. A comprehensive metabolic panel with fourteen OBX result segments materializes the first analyte and keeps the other thirteen reachable through df_message_json.
Walkthrough: lab orders and results end to end
The hl7-lab-orders-results demo ships in the DeltaForge demo library and exercises the order-to-result workflow with three ORM^O01 order messages and five ORU^R01 result messages from multiple lab systems. The full flow is three steps.
-
Create two tables over the same directory. The glob pattern does the message-type routing that a channel filter would do in an interface engine:
orm*.hl7selects only orders, while*.hl7takes everything, with OBX observation fields materialized for the result rows.CREATE EXTERNAL TABLE IF NOT EXISTS external.hl7_demos.lab_orders USING HL7 LOCATION 's3://acme-lake/demo/hl7-lab-orders-results/orm*.hl7' OPTIONS ( file_metadata = '{"columns":["df_file_name","df_row_number"]}' ); CREATE EXTERNAL TABLE IF NOT EXISTS external.hl7_demos.lab_results USING HL7 LOCATION 's3://acme-lake/demo/hl7-lab-orders-results/*.hl7' OPTIONS ( hl7_config = '{ "materialized_paths": [ "pid_3", "pid_5", "obr_4", "obx_2", "obx_3", "obx_5", "obx_6", "obx_7", "obx_8" ] }', file_metadata = '{"columns":["df_file_name","df_row_number"]}' );Orders and results as separate tables from one directory. OBR-4 is the test ordered; OBX 2 through 8 carry value type, observation ID, value, units, reference range, and abnormal flag. -
Query the clinical signal. OBX-8 carries the abnormal flag the lab system set (H for high, L for low, N for normal), so surfacing every out-of-range result is a WHERE clause, not a transformer script:
SELECT df_file_name, pid_5 AS patient_name, obx_3 AS test_id, obx_5 AS value, obx_6 AS units, obx_7 AS reference_range, obx_8 AS abnormal_flag FROM external.hl7_demos.lab_results WHERE obx_8 IS NOT NULL AND obx_8 <> '' AND obx_8 <> 'N';Against the demo data this finds one flagged result: a glucose of 182 mg/dl against a 70-105 reference range, flagged H. -
Verify the load. The same SQL that does analytics does the sanity checking; the demo asserts these counts automatically when you run it through the demo runner:
SELECT 'lab_orders' AS table_name, COUNT(*) AS row_count FROM external.hl7_demos.lab_orders UNION ALL SELECT 'lab_results_total' AS table_name, COUNT(*) AS row_count FROM external.hl7_demos.lab_results UNION ALL SELECT 'lab_results_oru_only' AS table_name, COUNT(*) AS row_count FROM external.hl7_demos.lab_results WHERE msh_9 LIKE 'ORU%';Expected: 3 orders, 8 total messages, 5 ORU results. The ORM rows in lab_results simply have NULL OBX columns.
Can you query HL7v2 files without Mirth or Iguana?
For analytics, yes, and that is exactly what the tables above do. But the comparison deserves a precise answer rather than a sales answer, because interface engines and analytical engines solve different problems.
Mirth Connect (now NextGen Connect) and iNTERFACEWARE's Iguana are interface engines. Their documented job is real-time message movement: listening on MLLP, acknowledging receipt, transforming messages between system dialects, and routing them to destination systems through configured channels. If you need to deliver live ADT events from one EHR to another with delivery guarantees, that is their territory, and DeltaForge does not compete there: it has no MLLP listener, sends no ACKs, and routes nothing.
What interface engines are not built to be is a query layer. Asking "how many emergency admissions did each facility send last quarter" through a channel architecture means transforming messages into a staging database first and maintaining that pipeline forever. DeltaForge starts where the messages come to rest:
- Interface engine (Mirth, Iguana)
- Real-time routing and transformation between clinical systems. Channels, transformers, ACK handling. Analytics requires building and maintaining a separate staging database.
- Custom HL7 ETL tool or script
- A parser library plus hand-written inserts into a schema you design up front. Every new segment, field, or HL7 version is a code change, and the raw message is usually discarded after extraction.
- DeltaForge external tables
- SQL directly over the message files. Schema decisions are per-table options you can change by recreating the table, the full message is always retained as JSON, and no per-message code exists to maintain.
The two approaches compose naturally: a common deployment lets the interface engine handle live delivery while archiving every message to object storage, and DeltaForge external tables make that archive queryable. DeltaForge is commercial software you install in your own environment, which matters here for a non-marketing reason: the PHI in those messages stays inside your network boundary, queried in place rather than uploaded to someone's cloud service.
What is the easiest way to analyze HL7 data?
If your messages already exist as files, the shortest path from "directory of .hl7 files" to "answered question" is the one this article walked: one external table, optionally a list of materialized paths, then plain SQL. There is no schema to design before you have seen the data, no parser to write, and no pipeline to babysit; the first SELECT against the header columns tells you what systems, versions, and message types you are holding, and you refine from there.
Because DeltaForge speaks PostgreSQL-flavored SQL, everything downstream of the external table is ordinary database work. HL7v2 analytics stops being a specialty: admissions by ward is a GROUP BY, abnormal results is a WHERE clause, and linking lab results to admissions joins on the patient identifier in pid_3. BI tools connect over the native ODBC and ADBC drivers, and hot query paths can be materialized into Delta tables on the same engine. Both demos used here, hl7-patient-admin and hl7-lab-orders-results, ship in the DeltaForge demo library with their data files, setup scripts, and asserted queries, so the fastest way to evaluate the claim is to run them against your own installation and then swap the demo glob for the path to your own message archive.