Bulk analytics
Bulk analytics is a platform pipeline for sessions that produce many tabular node outputs (for example hundreds of csv-input uploads across training runs). Instead of loading every blob into memory or exporting data manually, HEAT can:
- Discover matching node output IDs from core Postgres.
- Ingest CSV/JSON rows into a dedicated HEAT Bulk Analytics DB PostgreSQL instance.
- Query consolidated tables with read-only SQL.
- Publish query results to downstream nodes (for example
system-arbex-jsdashboards).
All bulk analytics node templates run on the system-utils platform runner. They are intended for operator and static analytics pipelines, not trainee-facing dashboard paths built with integrator dashboard-v2 templates alone.
When to use bulk analytics
| Scenario | Bulk analytics | Alternative |
|---|---|---|
| Thousands of CSV outputs from the same upload node across many sessions | Yes | — |
| SQL aggregation, filtering, and grouping over consolidated telemetry | Yes | — |
| One CSV file turned into a Next dashboard channel | No | system-tabular-to-dataservice or tabular-to-dataservice |
| Small ad-hoc SELECT over a single in-memory CSV | No | tabular-query (SQLite, all columns as TEXT) |
Typical source data: simulation capture CSV (vehicle, bio, and similar KB_* schemas), repeated ingest sessions, or any template whose upload node stores tabular blobs in HEAT Managed Object Store.
Architecture
Core Postgres holds session, node, and output metadata. node-output-query reads it read-only and emits a JSON list of outputIds.
system-bulk-tabular-writer fetches each output blob from object storage and inserts rows into HEAT Bulk Analytics DB. Each writer node instance gets its own database: bulk_ni_{nodeInstanceId}.
system-bulk-analytics-query runs validated SELECT-only SQL against that database and emits tabular JSON for downstream processing.
Tools and node templates
| Node template | Role | Detail page |
|---|---|---|
node-output-query | Discover all matching node output IDs (no pagination in v1) | node-output-query |
system-bulk-tabular-writer | Ingest CSV/JSON into analytics Postgres; emit heatBulkWriterCatalogV1 | system-bulk-tabular-writer |
system-bulk-analytics-query | Read-only SQL against the writer database | system-bulk-analytics-query |
system-arbex-js | Optional: turn query JSON into $heat-dataservice and layout hints | system-arbex-js |
Shipped session templates
Import these from Cluster Manager or use them as starting points when authoring templates:
| Template name | Graph |
|---|---|
bulk-analytics-ingest-reference | node-output-query → system-bulk-tabular-writer → system-bulk-analytics-query (ingest summary SQL) |
bulk-analytics-arbex-sample | Same ingest path, multi-query SQL, then system-arbex-js bar chart dashboard |
See Workflow for step-by-step configuration.
Infrastructure
Standard Kubernetes deploy includes:
| Component | Purpose |
|---|---|
heat-bulk-analytics-postgres StatefulSet | Dedicated PostgreSQL 15 for derivative analytics (32Gi PVC in base manifests) |
DataSource HEAT Bulk Analytics DB | Registered automatically on core-api startup (no manual Cluster Manager step) |
Platform key bulk_analytics.enabled | Default true; gates retention cleanup of writer databases on session delete |
Core API and system-utils do not block startup when analytics Postgres is down. Ingest and query nodes fail the node instance with a clear message if the database is unreachable when they run.
See Operations for deployment checks and troubleshooting.
Key contracts
| Artifact | Description |
|---|---|
heatBulkWriterCatalogV1 | Writer output: database name, table list, column types, join keys, ingest stats |
| Tabular query JSON | Query output: { "tables": { "name": { "columns": [], "rows": [] } } } |
Column typing uses conservative inference (integer, number, boolean, timestamp, uuid, string) so SQL range filters and aggregates work without casting. See Catalog and SQL.
Quick start (lab cluster)
- Confirm
heat-bulk-analytics-postgrespod is Running in theheatnamespace. - Restart core-api after deploy so setup registers
HEAT Bulk Analytics DB. - Ensure upstream sessions exist with a node named
csv-input(or adjustsourceNodeNamein the reference template). - Create a session from
bulk-analytics-ingest-reference(or your own graph with the same three nodes). - Monitor the writer node
StatusDetailsfor ingest progress (Ingesting output …). - When the query node succeeds, inspect its tabular JSON output or chain
bulk-analytics-arbex-samplefor a dashboard.
Related
- Workflow
- Catalog and SQL
- Operations
- Data sources (HEAT Bulk Analytics DB)
- System Utils