Skip to Content
This documentation is provided with the HEAT environment and is relevant for this HEAT instance only.
Bulk AnalyticsOverview

Bulk analytics

Bulk analytics is a platform pipeline for sessions that produce many tabular node outputs (for example hundreds of csv-input uploads across training runs). Instead of loading every blob into memory or exporting data manually, HEAT can:

  1. Discover matching node output IDs from core Postgres.
  2. Ingest CSV/JSON rows into a dedicated HEAT Bulk Analytics DB PostgreSQL instance.
  3. Query consolidated tables with read-only SQL.
  4. Publish query results to downstream nodes (for example system-arbex-js dashboards).

All bulk analytics node templates run on the system-utils platform runner. They are intended for operator and static analytics pipelines, not trainee-facing dashboard paths built with integrator dashboard-v2 templates alone.

When to use bulk analytics

ScenarioBulk analyticsAlternative
Thousands of CSV outputs from the same upload node across many sessionsYes
SQL aggregation, filtering, and grouping over consolidated telemetryYes
One CSV file turned into a Next dashboard channelNosystem-tabular-to-dataservice or tabular-to-dataservice
Small ad-hoc SELECT over a single in-memory CSVNotabular-query (SQLite, all columns as TEXT)

Typical source data: simulation capture CSV (vehicle, bio, and similar KB_* schemas), repeated ingest sessions, or any template whose upload node stores tabular blobs in HEAT Managed Object Store.

Architecture

Core Postgres holds session, node, and output metadata. node-output-query reads it read-only and emits a JSON list of outputIds.

system-bulk-tabular-writer fetches each output blob from object storage and inserts rows into HEAT Bulk Analytics DB. Each writer node instance gets its own database: bulk_ni_{nodeInstanceId}.

system-bulk-analytics-query runs validated SELECT-only SQL against that database and emits tabular JSON for downstream processing.

Tools and node templates

Node templateRoleDetail page
node-output-queryDiscover all matching node output IDs (no pagination in v1)node-output-query
system-bulk-tabular-writerIngest CSV/JSON into analytics Postgres; emit heatBulkWriterCatalogV1system-bulk-tabular-writer
system-bulk-analytics-queryRead-only SQL against the writer databasesystem-bulk-analytics-query
system-arbex-jsOptional: turn query JSON into $heat-dataservice and layout hintssystem-arbex-js

Shipped session templates

Import these from Cluster Manager or use them as starting points when authoring templates:

Template nameGraph
bulk-analytics-ingest-referencenode-output-querysystem-bulk-tabular-writersystem-bulk-analytics-query (ingest summary SQL)
bulk-analytics-arbex-sampleSame ingest path, multi-query SQL, then system-arbex-js bar chart dashboard

See Workflow for step-by-step configuration.

Infrastructure

Standard Kubernetes deploy includes:

ComponentPurpose
heat-bulk-analytics-postgres StatefulSetDedicated PostgreSQL 15 for derivative analytics (32Gi PVC in base manifests)
DataSource HEAT Bulk Analytics DBRegistered automatically on core-api startup (no manual Cluster Manager step)
Platform key bulk_analytics.enabledDefault true; gates retention cleanup of writer databases on session delete

Core API and system-utils do not block startup when analytics Postgres is down. Ingest and query nodes fail the node instance with a clear message if the database is unreachable when they run.

See Operations for deployment checks and troubleshooting.

Key contracts

ArtifactDescription
heatBulkWriterCatalogV1Writer output: database name, table list, column types, join keys, ingest stats
Tabular query JSONQuery output: { "tables": { "name": { "columns": [], "rows": [] } } }

Column typing uses conservative inference (integer, number, boolean, timestamp, uuid, string) so SQL range filters and aggregates work without casting. See Catalog and SQL.

Quick start (lab cluster)

  1. Confirm heat-bulk-analytics-postgres pod is Running in the heat namespace.
  2. Restart core-api after deploy so setup registers HEAT Bulk Analytics DB.
  3. Ensure upstream sessions exist with a node named csv-input (or adjust sourceNodeName in the reference template).
  4. Create a session from bulk-analytics-ingest-reference (or your own graph with the same three nodes).
  5. Monitor the writer node StatusDetails for ingest progress (Ingesting output …).
  6. When the query node succeeds, inspect its tabular JSON output or chain bulk-analytics-arbex-sample for a dashboard.