HL7 to SQL: Query HL7v2 Without an Interface Engine

Q: How do you convert HL7 messages to SQL tables?

Run CREATE EXTERNAL TABLE ... USING HL7 over a glob of .hl7 files. DeltaForge parses each message at query time and exposes the MSH header fields as columns, plus df_message_json holding the complete parsed message. An optional materialized_paths setting promotes any segment field, such as PID-5 or OBX-5, to a first-class column. There is no conversion job, no staging database, and no interface engine channel to build; the files themselves become a queryable table.

Q: Can you query HL7v2 files without Mirth or Iguana?

Yes, for analytics over messages at rest. Mirth Connect and Iguana are interface engines: they listen on MLLP, acknowledge, transform, and route live messages between clinical systems. DeltaForge does none of that. It is the analytics layer: point an external table at archived .hl7 files on disk or object storage and query them with SQL. Many teams run both, with the engine moving messages and DeltaForge querying the archive it leaves behind.

Q: How do you turn ADT messages into patient tables?

Create an HL7 external table with materialized_paths listing the PID and PV1 fields you need, for example pid_5 for patient name, pid_7 for date of birth, pid_8 for gender, and pv1_2 for patient class. Each ADT message becomes one row with those fields as columns, alongside the MSH header and the full message JSON. A single table can mix A01 admissions, A08 updates, and A03 discharges across HL7 versions 2.3 through 2.6.

Q: What is the easiest way to analyze HL7 data?

If your messages already exist as files, the shortest path is an external table over the raw .hl7 files and plain SQL on top: no per-message code, no schema design up front, no middleware. DeltaForge speaks PostgreSQL-flavored SQL, so grouping admissions by ward, flagging abnormal lab results, or joining HL7 data to other tables works like any other SQL workload, and BI tools connect through ODBC or ADBC.

Search for "hl7 to sql" and the results have barely moved in a decade: interface-engine documentation, forum threads from 2009 recommending a hand-rolled parser, and library tutorials that end with "now write the database insert yourself." The implied workflow is always the same. Stand up Mirth Connect or Iguana, build a channel, write a transformer per message type, design a staging schema, and maintain all of it. That is the right shape for moving live messages between clinical systems. It is a lot of machinery when the actual goal is to answer questions about messages you already have on disk.

This article shows the other path: pointing a SQL engine directly at the files. Every statement below comes from two demos in the DeltaForge demo library, hl7-patient-admin and hl7-lab-orders-results, which ship with real-world ADT, ORM, and ORU messages from EPIC, MegaReg, Folio3, Ritten, and other systems, spanning HL7 v2.3 through v2.6. You can run both demos yourself and reproduce every result.

How do you convert HL7 messages to SQL tables?

Start with what an HL7v2 message actually looks like. This is the opening of a real ADT^A01 admission from the demo data, sent by an EPIC system:

MSH|^~\&|EPIC|HOSPITAL|LAB|HOSPITAL|20230315140000||ADT^A01^ADT_A01|MSG00001|P|2.5.1|||AL|NE
EVN|A01|20230315140000|||JDOE^DOE^JOHN^M^DR^^^STAFF
PID|1||123456789^^^HOSP^MR~987-65-4321^^^SSN^SS||SMITH^JOHN^DAVID^JR^MR||19800515|M||2106-3^White^HL70005|123 MAIN ST^^CHICAGO^IL^60601^USA^HOME||...

An ADT^A01 admission message (abridged). The full message in the demo adds PV1 visit, AL1 allergy, DG1 diagnosis, GT1 guarantor, and IN1 insurance segments.

Pipes delimit fields, carets delimit components, tildes delimit repetitions, and the segment grammar varies by version and by sending system. This is exactly the kind of format that pushes people toward middleware. The DeltaForge alternative is one DDL statement:

CREATE ZONE IF NOT EXISTS external
    TYPE EXTERNAL
    COMMENT 'External tables for file-backed data';

CREATE SCHEMA IF NOT EXISTS external.hl7_demos
    COMMENT 'HL7 v2 message-backed external tables';

CREATE EXTERNAL TABLE IF NOT EXISTS external.hl7_demos.adt_messages
USING HL7
LOCATION 's3://acme-lake/demo/hl7-patient-admin/*.hl7'
OPTIONS (
    file_metadata = '{"columns":["df_file_name","df_row_number"]}'
);

One external table over a glob of .hl7 files. The optional file_metadata setting adds source-file traceability columns.

With no further configuration, each message becomes one row with three kinds of columns: the MSH header fields (msh_1 through msh_21) as plain text columns, df_message_json holding the complete parsed message as JSON, and df_message_id, a unique hash per message. The header columns alone already answer real questions:

SELECT
    df_file_name,
    msh_3 AS sending_app,
    msh_4 AS sending_facility,
    msh_9 AS message_type,
    msh_12 AS hl7_version,
    msh_10 AS message_control_id
FROM external.hl7_demos.adt_messages
ORDER BY df_file_name;

Header overview: one row per message, with sending system, message type, and protocol version as columns.

Against the demo data this returns eight rows: six A01 admissions, one A08 demographics update, and one A03 discharge, with sending_app values like EPIC and MegaReg and hl7_version ranging from 2.3 to 2.6. That last detail matters: five different HL7 versions coexist in one table because the version identifier is just another column (MSH-12), not a parser mode you have to configure per feed.

What happens at the storage layer

An external table copies nothing and converts nothing. The catalog stores the location glob and the parsing options; the .hl7 files stay where they are, on local disk or in an object store. At query time the engine lists the files matching the glob, parses each message, decodes HL7 escape sequences (\T\, \S\, \E\) into their literal characters, and streams the result to the query as Arrow record batches. Drop a new file into the directory and the next query sees it; there is no refresh step and no load job.

Two honest consequences follow from that design. First, every query re-reads and re-parses the files, which is ideal for a message archive that keeps growing but wasteful for a heavy dashboard hammering the same data; in that case run the external table query once and materialize the result into a Delta table, then point the dashboard at the Delta table. Second, fields keep their HL7 encoding: pid_5 comes back as SMITH^JOHN^DAVID^JR^MR with the component separators intact, and splitting that into last and first name is ordinary SQL string work, applied where you want it rather than baked into an ingestion pipeline you cannot revisit.

How do you turn ADT messages into patient tables?

Header columns describe messages. Patient analytics needs the fields inside PID (patient identification) and PV1 (patient visit). The hl7_config option's materialized_paths setting promotes any segment field to a first-class column, named by segment and field number:

CREATE EXTERNAL TABLE IF NOT EXISTS external.hl7_demos.adt_materialized
USING HL7
LOCATION 's3://acme-lake/demo/hl7-patient-admin/*.hl7'
OPTIONS (
    hl7_config = '{
        "materialized_paths": [
            "pid_3", "pid_5", "pid_7", "pid_8", "pid_11",
            "pv1_2", "pv1_3", "pv1_7",
            "evn_1", "evn_2"
        ]
    }',
    file_metadata = '{"columns":["df_file_name","df_row_number"]}'
);

The same files, now with patient ID, name, birth date, gender, address, patient class, location, and attending physician as columns.

Now the ADT feed reads like a patient table:

SELECT
    df_file_name,
    pid_5 AS patient_name,
    pid_7 AS date_of_birth,
    pid_8 AS gender,
    pid_11 AS address,
    pv1_2 AS patient_class
FROM external.hl7_demos.adt_materialized
ORDER BY pid_5;

Patient demographics across all eight messages: names, birth dates, genders, addresses, and inpatient/outpatient/emergency class from PV1-2.

Because the version identifier is a column, confirming that one table really is absorbing five protocol versions is a one-line aggregate:

SELECT
    msh_12 AS hl7_version,
    COUNT(*) AS message_count
FROM external.hl7_demos.adt_messages
GROUP BY msh_12
ORDER BY msh_12;

Against the demo data: versions 2.3, 2.3.1, 2.5, 2.5.1, and 2.6, all parsed by one table with no version-specific configuration.

Materialized paths cover the fields you query constantly. Everything else, including next-of-kin (NK1), allergies (AL1), diagnoses (DG1), guarantor (GT1), insurance (IN1), and even site-specific Z-segments, is preserved in df_message_json, where every segment appears with its fields as arrays. Nothing is dropped just because you did not materialize it:

SELECT
    df_file_name,
    msh_9 AS message_type,
    df_message_json
FROM external.hl7_demos.adt_messages
ORDER BY df_file_name
LIMIT 3;

The full parsed message rides along with every row; JSON functions reach any segment without changing the table definition.

One behavior worth knowing before you rely on it: when a segment repeats within a message, the default mode materializes the first occurrence into the column, and all occurrences remain in the JSON. A comprehensive metabolic panel with fourteen OBX result segments materializes the first analyte and keeps the other thirteen reachable through df_message_json.

Walkthrough: lab orders and results end to end

The hl7-lab-orders-results demo ships in the DeltaForge demo library and exercises the order-to-result workflow with three ORM^O01 order messages and five ORU^R01 result messages from multiple lab systems. The full flow is three steps.

Create two tables over the same directory. The glob pattern does the message-type routing that a channel filter would do in an interface engine: orm*.hl7 selects only orders, while *.hl7 takes everything, with OBX observation fields materialized for the result rows.

CREATE EXTERNAL TABLE IF NOT EXISTS external.hl7_demos.lab_orders
USING HL7
LOCATION 's3://acme-lake/demo/hl7-lab-orders-results/orm*.hl7'
OPTIONS (
    file_metadata = '{"columns":["df_file_name","df_row_number"]}'
);

CREATE EXTERNAL TABLE IF NOT EXISTS external.hl7_demos.lab_results
USING HL7
LOCATION 's3://acme-lake/demo/hl7-lab-orders-results/*.hl7'
OPTIONS (
    hl7_config = '{
        "materialized_paths": [
            "pid_3", "pid_5",
            "obr_4",
            "obx_2", "obx_3", "obx_5", "obx_6", "obx_7", "obx_8"
        ]
    }',
    file_metadata = '{"columns":["df_file_name","df_row_number"]}'
);

Orders and results as separate tables from one directory. OBR-4 is the test ordered; OBX 2 through 8 carry value type, observation ID, value, units, reference range, and abnormal flag.

Query the clinical signal. OBX-8 carries the abnormal flag the lab system set (H for high, L for low, N for normal), so surfacing every out-of-range result is a WHERE clause, not a transformer script:
```
SELECT
    df_file_name,
    pid_5 AS patient_name,
    obx_3 AS test_id,
    obx_5 AS value,
    obx_6 AS units,
    obx_7 AS reference_range,
    obx_8 AS abnormal_flag
FROM external.hl7_demos.lab_results
WHERE obx_8 IS NOT NULL AND obx_8 <> '' AND obx_8 <> 'N';
```
Against the demo data this finds one flagged result: a glucose of 182 mg/dl against a 70-105 reference range, flagged H.

Verify the load. The same SQL that does analytics does the sanity checking; the demo asserts these counts automatically when you run it through the demo runner:

SELECT 'lab_orders' AS table_name, COUNT(*) AS row_count
FROM external.hl7_demos.lab_orders
UNION ALL
SELECT 'lab_results_total' AS table_name, COUNT(*) AS row_count
FROM external.hl7_demos.lab_results
UNION ALL
SELECT 'lab_results_oru_only' AS table_name, COUNT(*) AS row_count
FROM external.hl7_demos.lab_results
WHERE msh_9 LIKE 'ORU%';

Expected: 3 orders, 8 total messages, 5 ORU results. The ORM rows in lab_results simply have NULL OBX columns.

Can you query HL7v2 files without Mirth or Iguana?

For analytics, yes, and that is exactly what the tables above do. But the comparison deserves a precise answer rather than a sales answer, because interface engines and analytical engines solve different problems.

Mirth Connect (now NextGen Connect) and iNTERFACEWARE's Iguana are interface engines. Their documented job is real-time message movement: listening on MLLP, acknowledging receipt, transforming messages between system dialects, and routing them to destination systems through configured channels. If you need to deliver live ADT events from one EHR to another with delivery guarantees, that is their territory, and DeltaForge does not compete there: it has no MLLP listener, sends no ACKs, and routes nothing.

What interface engines are not built to be is a query layer. Asking "how many emergency admissions did each facility send last quarter" through a channel architecture means transforming messages into a staging database first and maintaining that pipeline forever. DeltaForge starts where the messages come to rest:

Interface engine (Mirth, Iguana): Real-time routing and transformation between clinical systems. Channels, transformers, ACK handling. Analytics requires building and maintaining a separate staging database.
Custom HL7 ETL tool or script: A parser library plus hand-written inserts into a schema you design up front. Every new segment, field, or HL7 version is a code change, and the raw message is usually discarded after extraction.
DeltaForge external tables: SQL directly over the message files. Schema decisions are per-table options you can change by recreating the table, the full message is always retained as JSON, and no per-message code exists to maintain.

The two approaches compose naturally: a common deployment lets the interface engine handle live delivery while archiving every message to object storage, and DeltaForge external tables make that archive queryable. DeltaForge is commercial software you install in your own environment, which matters here for a non-marketing reason: the PHI in those messages stays inside your network boundary, queried in place rather than uploaded to someone's cloud service.

What is the easiest way to analyze HL7 data?

If your messages already exist as files, the shortest path from "directory of .hl7 files" to "answered question" is the one this article walked: one external table, optionally a list of materialized paths, then plain SQL. There is no schema to design before you have seen the data, no parser to write, and no pipeline to babysit; the first SELECT against the header columns tells you what systems, versions, and message types you are holding, and you refine from there.

Because DeltaForge speaks PostgreSQL-flavored SQL, everything downstream of the external table is ordinary database work. HL7v2 analytics stops being a specialty: admissions by ward is a GROUP BY, abnormal results is a WHERE clause, and linking lab results to admissions joins on the patient identifier in pid_3. BI tools connect over the native ODBC and ADBC drivers, and hot query paths can be materialized into Delta tables on the same engine. Both demos used here, hl7-patient-admin and hl7-lab-orders-results, ship in the DeltaForge demo library with their data files, setup scripts, and asserted queries, so the fastest way to evaluate the claim is to run them against your own installation and then swap the demo glob for the path to your own message archive.