Audio Utils Runner
The Audio Utils runner (audio-utils-runner) processes captured or uploaded audio: normalize to PCM WAV, emit JSON metrics, run speech-to-text, and combine metrics with transcripts for voice analysis.
Node template selection
| Template | Purpose | Limitations | Details |
|---|---|---|---|
convert-to-pcm | Decode WAV/AIFF/AIFC uploads and emit PCM WAV plus a per-file acceptance report. | Non-audio blobs are skipped (node does not fail the whole batch); resample/bit-depth changes need numpy/librosa in the image. | convert-to-pcm |
audio-metrics | Compute JSON audio metrics (duration, levels, and related fields) per accepted file. | Tolerates mixed non-audio upstream blobs; marks invalid items with parseError. | audio-metrics |
generate-transcript | Speech-to-text (Whisper, faster-whisper, whisperx, wav2vec) keyed by content hash. | Heavy CPU/GPU and model dependencies; needs audio-metrics and PCM map inputs; air-gap sites must bundle models. | generate-transcript |
voice-analysis | Merge metrics and transcript signals into a combined JSON analysis document. | Expects wired upstream parents; evolving output schema; not a dashboard publisher. | voice-analysis |