Skip to Content
This documentation is provided with the HEAT environment and is relevant for this HEAT instance only.
RunnersAudio Utilsgenerate-transcript

generate-transcript

The generate-transcript transform runs speech-to-text over audio files listed in an upstream audio-metrics JSON document, using PCM bytes from convert-to-pcm outputs keyed by MD5.

When to use it

Use after convert-to-pcm and audio-metrics in an audio analytics pipeline when you need searchable transcripts (plain text or time-segmented JSON).

Limitations

  • Requires audio-metrics JSON and matching PCM blobs in upstream inputs; missing metrics fails the node.
  • Supported model values: whisper, faster-whisper, whisperx, wav2vec (default whisperx). Models are loaded at runtime: plan CPU/GPU and image size for air-gapped clusters.
  • outputFormat: plain or segments (default segments). Invalid values fail the node.

Configuration (summary)

PropertyDefaultDescription
outputFormatsegmentsplain or segments
includeTimestampstrueInclude timing in segment output
modelwhisperxASR backend (see supported list above)
modelSizelarge-v3Model size passed to the backend
devicecpuInference device
languageenTranscription language code
dataSourceName(runner default)Object store for outputs

Manifest schema lists outputFormat and includeTimestamps; runtime also accepts model, modelSize, device, and language as used by the processor.