Skip to Content
This documentation is provided with the HEAT environment and is relevant for this HEAT instance only.
RunnersSystem Utilssystem-bulk-analytics-query

system-bulk-analytics-query

Platform reserved (system-utils only). Connects to HEAT Bulk Analytics DB, runs one or more read-only SELECT statements, and registers tabular JSON for downstream nodes (for example system-arbex-js).

The analytics database target comes from configuration or from a parent system-bulk-tabular-writer catalog. Use explicit parameters when the upstream node is not a bulk writer (for example a static pipeline that queries a database populated by an earlier ingest session).

When to use it

Use after bulk ingest when you need SQL aggregation or sampling over consolidated CSV/JSON tables without exporting the full analytics database.

node-output-query → system-bulk-tabular-writer → system-bulk-analytics-query → (optional) system-arbex-js

Standalone (no bulk-writer parent):

system-bulk-analytics-query (analyticsDatabase / bulkWriterNodeInstanceId + tableName or sql)

Shipped reference: bulk-analytics-ingest-reference (query + writer + SQL). Full pipeline with Arbex: bulk-analytics-arbex-sample.

Target database

Provide one of the following when there is no parent bulk writer:

KeyPurpose
bulkWriterNodeInstanceIdResolves to bulk_ni_{id} (the database created by that writer node instance).
analyticsDatabaseExplicit database name (must match bulk_ni_{nodeInstanceId} pattern).

When both are set, they must agree. If neither is set, the node reads analyticsDatabase from the parent heatBulkWriterCatalogV1 output.

Query configuration

Provide either sql or tableName or queries[].

KeyDefaultPurpose
sqlSingle read-only SELECT.
tableNameWhen sql is omitted, runs SELECT * FROM "{tableName}".
queries[]Array of { sql?, tableName?, outputTableName?, maxRows? }. Each entry needs sql or tableName.
outputTableNameresult or tableNameLabel for single-query tabular output.
maxRows10000Per-query row cap (max 100000).
outputFormatautoauto (CSV for one query, JSON for queries[]), json, or csv (single query only).

Example (explicit database + table, no bulk-writer parent):

{ "bulkWriterNodeInstanceId": 42, "tableName": "csv_a1b2c3d4", "maxRows": 100 }

Example (explicit database + custom SQL):

{ "analyticsDatabase": "bulk_ni_42", "sql": "SELECT heat_session_id, COUNT(*) AS row_count FROM \"csv_a1b2c3d4\" GROUP BY heat_session_id", "outputTableName": "session_counts", "maxRows": 5000 }

Example (parent bulk writer supplies database; only SQL needed):

{ "sql": "SELECT COUNT(*) AS total FROM \"_ingested_outputs\"", "outputTableName": "ingest_summary", "maxRows": 1 }

Use qualifiedFrom values from a writer catalog when building SQL manually. CSV columns are stored with inferred Postgres types (bigint, double precision, boolean, timestamptz, uuid, or text); use the catalog valueKind field for portable typing in downstream tools. Range filters and aggregates work natively on typed columns without CAST.

Output format

outputFormatWhen to useArtifact
auto (default)Single sql / tableName queryCSV (text/csv)
autoqueries[] with multiple entriesJSON tabular envelope
csvForce CSV export for one queryCSV
jsonDownstream nodes (for example system-arbex-js) that read tables.columns / tables.rowsJSON

CSV output is a standard header row plus data rows, suitable for download or spreadsheet import. Set "outputFormat": "json" when a child node expects the JSON envelope (see bulk-analytics-arbex-sample).

JSON output shape

{ "tables": { "session_counts": { "columns": ["heat_session_id", "row_count"], "rows": [ ["11111111-1111-1111-1111-111111111111", 42] ] } } }

Security and limitations

  • Only SELECT / WITH … SELECT statements are accepted. Writes, DDL, COPY, and multiple statements are rejected.
  • analyticsDatabase must match the bulk_ni_{nodeInstanceId} naming convention.
  • Requires a reachable HEAT Bulk Analytics DB (analytics Postgres pod and registered DataSource).
  • Large scans can be slow; use maxRows and narrow SELECT lists.
  • Node fails with ProcessingFailed when analytics Postgres is unavailable or SQL is invalid.