Skip to Content
This documentation is provided with the HEAT environment and is relevant for this HEAT instance only.
Bulk AnalyticsCatalog and SQL

Bulk analytics catalog and SQL

After system-bulk-tabular-writer completes, it registers heatBulkWriterCatalogV1 JSON. Downstream system-bulk-analytics-query and external tools use this catalog to discover database names, table names, column types, and safe SQL identifiers.

Catalog top-level fields

FieldPurpose
catalogVersionAlways heatBulkWriterCatalogV1
bulkWriterNodeInstanceIdWriter node instance id
dataSourceNameHEAT Bulk Analytics DB
analyticsDatabasePostgreSQL database name (bulk_ni_{id})
analyticsHostCluster-internal host (default heat-bulk-analytics-postgres)
joinKeysShared columns for joining tables
tablesMap of table name → table metadata
outputsProcessedThisRunOutputs ingested this run
outputsSkippedAlreadyIngestedSkipped (ledger hit)
outputsFailedRead or ingest failures
failedOutputIdsOutput ids that failed

Table layout

Each bulk writer owns one PostgreSQL database: bulk_ni_{nodeInstanceId}.

Table patternContents
csv_{schemaKey}CSV rows sharing the same header set (schemaKey is an 8-character hash of sorted column names)
json_{schemaKey}JSON tabular payloads (one JSONB payload column plus metadata)
_ingested_outputsSystem ledger: which output ids were ingested, row counts, schema keys

Every data table includes HEAT metadata columns:

ColumnPostgres typePurpose
heat_session_iduuidSession that owned the source output
heat_node_output_idintegerSource NodeOutput.Id
heat_source_session_iduuidSame as heat_session_id for CSV ingest (join key)

Display hints such as KB_BIO and KB_VEHICLE are inferred from CSV column prefixes when present (catalog displayHint on tables).

Column typing

CSV data columns use conservative type inference when a table is first created. The writer samples non-empty cell values from the first ingested file for that schema:

valueKindPostgres typeInference rule (simplified)
integerbigintAll sampled values parse as integers
numberdouble precisionAll sampled values parse as floats
booleanbooleanAll sampled values are true/false/yes/no
timestamptimestamptzAll sampled values are ISO-8601 timestamps
uuiduuidAll sampled values are GUIDs
stringtextMixed types, labels, or ambiguity

Null sentinels (N/A, empty, null, none, -) are ignored during inference and stored as SQL NULL on insert.

Important: types are fixed at CREATE TABLE IF NOT EXISTS. Re-ingesting into an existing all-TEXT table does not alter column types. Drop bulk_ni_{id} and re-run the writer to pick up new inference rules.

Column catalog entry

Each column in tables[].columns[] includes:

{ "name": "KB_VEHICLE_SPEED", "postgresType": "double precision", "valueKind": "number", "nullable": true, "sampleValue": 12.5 }
FieldUse for downstream tools
valueKindPortable type for UI binding, chart axes, and API consumers
postgresTypeExact Postgres type for SQL generation
nullableWhether SQL NULL is expected
sampleValueQuick preview (strings truncated)

Query result JSON from system-bulk-analytics-query returns typed cell values in rows (numbers and booleans as JSON numbers/booleans, not strings).

Join keys

Default catalog joinKeys:

["heat_session_id", "heat_node_output_id", "heat_source_session_id"]

Use these to correlate rows across csv_* tables or to tie analytics rows back to core session metadata.

Example join:

SELECT a.heat_session_id, a."KB_VEHICLE_SPEED" AS speed, b."KB_BIO_HEART_RATE" AS heart_rate FROM "csv_vehicle_hash" AS a JOIN "csv_bio_hash" AS b ON a.heat_session_id = b.heat_session_id WHERE a."KB_VEHICLE_SPEED" > 50 LIMIT 1000

Replace table names with values from your catalog tables keys and use each table’s qualifiedFrom for quoting.

SQL patterns (system-bulk-analytics-query)

Ingest ledger

SELECT COUNT(*) AS total_ingested FROM "_ingested_outputs"
SELECT table_name, SUM(row_count) AS row_count FROM "_ingested_outputs" GROUP BY table_name ORDER BY table_name

Range filter on typed numeric column

When valueKind is number or integer:

SELECT heat_session_id, "KB_VEHICLE_SPEED" FROM "csv_a1b2c3d4" WHERE "KB_VEHICLE_SPEED" > 50 ORDER BY "KB_VEHICLE_SPEED" DESC LIMIT 500

No CAST is required for correctly inferred types.

Time window

When valueKind is timestamp:

SELECT heat_session_id, "KB_VEHICLE_CAPTURED_DATE_TIME (IS0 8601)" FROM "csv_a1b2c3d4" WHERE "KB_VEHICLE_CAPTURED_DATE_TIME (IS0 8601)" >= TIMESTAMPTZ '2026-05-27T00:00:00Z' AND "KB_VEHICLE_CAPTURED_DATE_TIME (IS0 8601)" < TIMESTAMPTZ '2026-05-28T00:00:00Z' LIMIT 10000

Columns that stayed string

Mixed labels (N/A, gear names, and similar) remain text. Use CAST or regex guards:

SELECT * FROM "csv_a1b2c3d4" WHERE "KB_VEHICLE_PHYS_GEAR" ~ '^[0-9]+$'

Security constraints

The query node accepts SELECT-only SQL (including WITH … SELECT). DDL, DML, COPY, and multiple statements are rejected. Use maxRows (default 10000, cap 100000) to limit result size.

Using the catalog in Arbex

bulk-analytics-arbex-sample reads query output shape:

{ "tables": { "ingest_summary": { "columns": ["total_ingested"], "rows": [[237]] } } }

Arbex scripts can map columns + rows into heat.dataservice value channels and layout builders. See the script embedded in the shipped template and Arbex API reference.