Function reference

Transcription

Transcription pipelines and functions.

pipelines.pipeline	Run the full transcription pipeline (VAD -> Transcribe -> Emissions -> Align).
asr.ct2.transcribe	Transcribe audio files using CTranslate2 Whisper model.
asr.hf.transcribe	Transcribe audio files using HuggingFace Whisper model.

Text processing utilities. See also SpanMapNormalizer from easyaligner.

Default text normalization function.

Dataset classes for creating Pytorch DataLoaders.

StreamingAudioSliceDataset	Dataset that lazily loads audio chunks on-demand using ffmpeg seek.
StreamingAudioFileDataset	Streaming version of AudioFileDataset that reads audio chunks on-demand.

Data models for storing transcribed text and metadata.

AudioMetadata	Data model for the metadata of an audio file.
SpeechSegment	A slice of the audio file that contains speech of interest to be aligned.
WordSegment	Word-level alignment data.
AlignmentSegment	A segment of aligned audio and text.
AudioChunk	Segment of audio, usually created by Voice Activity Detection (VAD).

Utility functions for various tasks.

Convert a Hugging Face Transformers model to CTranslate2 format.