Bulk analytics

Bulk analytics is a platform pipeline for sessions that produce many tabular node outputs (for example hundreds of csv-input uploads across training runs). Instead of loading every blob into memory or exporting data manually, HEAT can:

Discover matching node output IDs from core Postgres.
Ingest CSV/JSON rows into a dedicated HEAT Bulk Analytics DB PostgreSQL instance.
Query consolidated tables with read-only SQL.
Publish query results to downstream nodes (for example system-arbex-js dashboards).

All bulk analytics node templates run on the system-utils platform runner. They are intended for operator and static analytics pipelines, not trainee-facing dashboard paths built with integrator dashboard-v2 templates alone.

When to use bulk analytics

Scenario	Bulk analytics	Alternative
Thousands of CSV outputs from the same upload node across many sessions	Yes	—
SQL aggregation, filtering, and grouping over consolidated telemetry	Yes	—
One CSV file turned into a Next dashboard channel	No	system-tabular-to-dataservice or tabular-to-dataservice
Small ad-hoc SELECT over a single in-memory CSV	No	tabular-query (SQLite, all columns as TEXT)

Typical source data: simulation capture CSV (vehicle, bio, and similar KB_* schemas), repeated ingest sessions, or any template whose upload node stores tabular blobs in HEAT Managed Object Store.

Architecture

Core Postgres holds session, node, and output metadata. node-output-query reads it read-only and emits a JSON list of outputIds.

system-bulk-tabular-writer fetches each output blob from object storage and inserts rows into HEAT Bulk Analytics DB. Each writer node instance gets its own database: bulk_ni_{nodeInstanceId}.

system-bulk-analytics-query runs validated SELECT-only SQL against that database and emits tabular JSON for downstream processing.

Tools and node templates

Node template	Role	Detail page
`node-output-query`	Discover all matching node output IDs (no pagination in v1)	node-output-query
`system-bulk-tabular-writer`	Ingest CSV/JSON into analytics Postgres; emit `heatBulkWriterCatalogV1`	system-bulk-tabular-writer
`system-bulk-analytics-query`	Read-only SQL against the writer database	system-bulk-analytics-query
`system-arbex-js`	Optional: turn query JSON into `$heat-dataservice` and layout hints	system-arbex-js

Shipped session templates

Import these from Cluster Manager or use them as starting points when authoring templates:

Template name	Graph
`bulk-analytics-ingest-reference`	`node-output-query` → `system-bulk-tabular-writer` → `system-bulk-analytics-query` (ingest summary SQL)
`bulk-analytics-arbex-sample`	Same ingest path, multi-query SQL, then `system-arbex-js` bar chart dashboard

See Workflow for step-by-step configuration.

Infrastructure

Standard Kubernetes deploy includes:

Component	Purpose
`heat-bulk-analytics-postgres` StatefulSet	Dedicated PostgreSQL 15 for derivative analytics (32Gi PVC in base manifests)
DataSource `HEAT Bulk Analytics DB`	Registered automatically on core-api startup (no manual Cluster Manager step)
Platform key `bulk_analytics.enabled`	Default `true`; gates retention cleanup of writer databases on session delete

Core API and system-utils do not block startup when analytics Postgres is down. Ingest and query nodes fail the node instance with a clear message if the database is unreachable when they run.

See Operations for deployment checks and troubleshooting.

Key contracts

Artifact	Description
`heatBulkWriterCatalogV1`	Writer output: database name, table list, column types, join keys, ingest stats
Tabular query JSON	Query output: `{ "tables": { "name": { "columns": [], "rows": [] } } }`

Column typing uses conservative inference (integer, number, boolean, timestamp, uuid, string) so SQL range filters and aggregates work without casting. See Catalog and SQL.

Quick start (lab cluster)

Confirm heat-bulk-analytics-postgres pod is Running in the heat namespace.
Restart core-api after deploy so setup registers HEAT Bulk Analytics DB.
Ensure upstream sessions exist with a node named csv-input (or adjust sourceNodeName in the reference template).
Create a session from bulk-analytics-ingest-reference (or your own graph with the same three nodes).
Monitor the writer node StatusDetails for ingest progress (Ingesting output …).
When the query node succeeds, inspect its tabular JSON output or chain bulk-analytics-arbex-sample for a dashboard.