Bulk analytics catalog and SQL
After system-bulk-tabular-writer completes, it registers heatBulkWriterCatalogV1 JSON. Downstream system-bulk-analytics-query and external tools use this catalog to discover database names, table names, column types, and safe SQL identifiers.
Catalog top-level fields
| Field | Purpose |
|---|---|
catalogVersion | Always heatBulkWriterCatalogV1 |
bulkWriterNodeInstanceId | Writer node instance id |
dataSourceName | HEAT Bulk Analytics DB |
analyticsDatabase | PostgreSQL database name (bulk_ni_{id}) |
analyticsHost | Cluster-internal host (default heat-bulk-analytics-postgres) |
joinKeys | Shared columns for joining tables |
tables | Map of table name → table metadata |
outputsProcessedThisRun | Outputs ingested this run |
outputsSkippedAlreadyIngested | Skipped (ledger hit) |
outputsFailed | Read or ingest failures |
failedOutputIds | Output ids that failed |
Table layout
Each bulk writer owns one PostgreSQL database: bulk_ni_{nodeInstanceId}.
| Table pattern | Contents |
|---|---|
csv_{schemaKey} | CSV rows sharing the same header set (schemaKey is an 8-character hash of sorted column names) |
json_{schemaKey} | JSON tabular payloads (one JSONB payload column plus metadata) |
_ingested_outputs | System ledger: which output ids were ingested, row counts, schema keys |
Every data table includes HEAT metadata columns:
| Column | Postgres type | Purpose |
|---|---|---|
heat_session_id | uuid | Session that owned the source output |
heat_node_output_id | integer | Source NodeOutput.Id |
heat_source_session_id | uuid | Same as heat_session_id for CSV ingest (join key) |
Display hints such as KB_BIO and KB_VEHICLE are inferred from CSV column prefixes when present (catalog displayHint on tables).
Column typing
CSV data columns use conservative type inference when a table is first created. The writer samples non-empty cell values from the first ingested file for that schema:
valueKind | Postgres type | Inference rule (simplified) |
|---|---|---|
integer | bigint | All sampled values parse as integers |
number | double precision | All sampled values parse as floats |
boolean | boolean | All sampled values are true/false/yes/no |
timestamp | timestamptz | All sampled values are ISO-8601 timestamps |
uuid | uuid | All sampled values are GUIDs |
string | text | Mixed types, labels, or ambiguity |
Null sentinels (N/A, empty, null, none, -) are ignored during inference and stored as SQL NULL on insert.
Important: types are fixed at CREATE TABLE IF NOT EXISTS. Re-ingesting into an existing all-TEXT table does not alter column types. Drop bulk_ni_{id} and re-run the writer to pick up new inference rules.
Column catalog entry
Each column in tables[].columns[] includes:
{
"name": "KB_VEHICLE_SPEED",
"postgresType": "double precision",
"valueKind": "number",
"nullable": true,
"sampleValue": 12.5
}| Field | Use for downstream tools |
|---|---|
valueKind | Portable type for UI binding, chart axes, and API consumers |
postgresType | Exact Postgres type for SQL generation |
nullable | Whether SQL NULL is expected |
sampleValue | Quick preview (strings truncated) |
Query result JSON from system-bulk-analytics-query returns typed cell values in rows (numbers and booleans as JSON numbers/booleans, not strings).
Join keys
Default catalog joinKeys:
["heat_session_id", "heat_node_output_id", "heat_source_session_id"]Use these to correlate rows across csv_* tables or to tie analytics rows back to core session metadata.
Example join:
SELECT
a.heat_session_id,
a."KB_VEHICLE_SPEED" AS speed,
b."KB_BIO_HEART_RATE" AS heart_rate
FROM "csv_vehicle_hash" AS a
JOIN "csv_bio_hash" AS b
ON a.heat_session_id = b.heat_session_id
WHERE a."KB_VEHICLE_SPEED" > 50
LIMIT 1000Replace table names with values from your catalog tables keys and use each table’s qualifiedFrom for quoting.
SQL patterns (system-bulk-analytics-query)
Ingest ledger
SELECT COUNT(*) AS total_ingested FROM "_ingested_outputs"SELECT table_name, SUM(row_count) AS row_count
FROM "_ingested_outputs"
GROUP BY table_name
ORDER BY table_nameRange filter on typed numeric column
When valueKind is number or integer:
SELECT heat_session_id, "KB_VEHICLE_SPEED"
FROM "csv_a1b2c3d4"
WHERE "KB_VEHICLE_SPEED" > 50
ORDER BY "KB_VEHICLE_SPEED" DESC
LIMIT 500No CAST is required for correctly inferred types.
Time window
When valueKind is timestamp:
SELECT heat_session_id, "KB_VEHICLE_CAPTURED_DATE_TIME (IS0 8601)"
FROM "csv_a1b2c3d4"
WHERE "KB_VEHICLE_CAPTURED_DATE_TIME (IS0 8601)" >= TIMESTAMPTZ '2026-05-27T00:00:00Z'
AND "KB_VEHICLE_CAPTURED_DATE_TIME (IS0 8601)" < TIMESTAMPTZ '2026-05-28T00:00:00Z'
LIMIT 10000Columns that stayed string
Mixed labels (N/A, gear names, and similar) remain text. Use CAST or regex guards:
SELECT *
FROM "csv_a1b2c3d4"
WHERE "KB_VEHICLE_PHYS_GEAR" ~ '^[0-9]+$'Security constraints
The query node accepts SELECT-only SQL (including WITH … SELECT). DDL, DML, COPY, and multiple statements are rejected. Use maxRows (default 10000, cap 100000) to limit result size.
Using the catalog in Arbex
bulk-analytics-arbex-sample reads query output shape:
{
"tables": {
"ingest_summary": {
"columns": ["total_ingested"],
"rows": [[237]]
}
}
}Arbex scripts can map columns + rows into heat.dataservice value channels and layout builders. See the script embedded in the shipped template and Arbex API reference.