generate-transcript
The generate-transcript transform runs speech-to-text over audio files listed in an upstream audio-metrics JSON document, using PCM bytes from convert-to-pcm outputs keyed by MD5.
When to use it
Use after convert-to-pcm and audio-metrics in an audio analytics pipeline when you need searchable transcripts (plain text or time-segmented JSON).
Limitations
- Requires
audio-metricsJSON and matching PCM blobs in upstream inputs; missing metrics fails the node. - Supported
modelvalues:whisper,faster-whisper,whisperx,wav2vec(defaultwhisperx). Models are loaded at runtime: plan CPU/GPU and image size for air-gapped clusters. outputFormat:plainorsegments(defaultsegments). Invalid values fail the node.
Configuration (summary)
| Property | Default | Description |
|---|---|---|
outputFormat | segments | plain or segments |
includeTimestamps | true | Include timing in segment output |
model | whisperx | ASR backend (see supported list above) |
modelSize | large-v3 | Model size passed to the backend |
device | cpu | Inference device |
language | en | Transcription language code |
dataSourceName | (runner default) | Object store for outputs |
Manifest schema lists outputFormat and includeTimestamps; runtime also accepts model, modelSize, device, and language as used by the processor.