This documentation is provided with the HEAT environment and is relevant for this HEAT instance only.

Audio Utils Runner

The Audio Utils runner (audio-utils-runner) processes captured or uploaded audio: normalize to PCM WAV, emit JSON metrics, run speech-to-text, and combine metrics with transcripts for voice analysis.

Node template selection

Template	Purpose	Limitations	Details
`convert-to-pcm`	Decode WAV/AIFF/AIFC uploads and emit PCM WAV plus a per-file acceptance report.	Non-audio blobs are skipped (node does not fail the whole batch); resample/bit-depth changes need numpy/librosa in the image.	convert-to-pcm
`audio-metrics`	Compute JSON audio metrics (duration, levels, and related fields) per accepted file.	Tolerates mixed non-audio upstream blobs; marks invalid items with `parseError`.	audio-metrics
`generate-transcript`	Speech-to-text (Whisper, faster-whisper, whisperx, wav2vec) keyed by content hash.	Heavy CPU/GPU and model dependencies; needs `audio-metrics` and PCM map inputs; air-gap sites must bundle models.	generate-transcript
`voice-analysis`	Merge metrics and transcript signals into a combined JSON analysis document.	Expects wired upstream parents; evolving output schema; not a dashboard publisher.	voice-analysis

Typical pipeline

Tabular and JSON session templates audio-metrics