generate-transcript

The generate-transcript transform runs speech-to-text over audio files listed in an upstream audio-metrics JSON document, using PCM bytes from convert-to-pcm outputs keyed by MD5.

When to use it

Use after convert-to-pcm and audio-metrics in an audio analytics pipeline when you need searchable transcripts (plain text or time-segmented JSON).

Limitations

Requires audio-metrics JSON and matching PCM blobs in upstream inputs; missing metrics fails the node.
Supported model values: whisper, faster-whisper, whisperx, wav2vec (default whisperx). Models are loaded at runtime: plan CPU/GPU and image size for air-gapped clusters.
outputFormat: plain or segments (default segments). Invalid values fail the node.

Configuration (summary)

Property	Default	Description
`outputFormat`	`segments`	`plain` or `segments`
`includeTimestamps`	`true`	Include timing in segment output
`model`	`whisperx`	ASR backend (see supported list above)
`modelSize`	`large-v3`	Model size passed to the backend
`device`	`cpu`	Inference device
`language`	`en`	Transcription language code
`dataSourceName`	(runner default)	Object store for outputs

Manifest schema lists outputFormat and includeTimestamps; runtime also accepts model, modelSize, device, and language as used by the processor.

generate-transcript

When to use it

Limitations

Configuration (summary)

Related