Function reference

Transcription

Transcription pipelines and functions.

pipelines.pipeline Run the full transcription pipeline (VAD -> Transcribe -> Emissions -> Align).
asr.ct2.transcribe Transcribe audio files using CTranslate2 Whisper model.
asr.hf.transcribe Transcribe audio files using HuggingFace Whisper model.

Text Processing

Text processing utilities. See also SpanMapNormalizer from easyaligner.

text_normalizer Default text normalization function.

Datasets

Dataset classes for creating Pytorch DataLoaders.

StreamingAudioSliceDataset Dataset that lazily loads audio chunks on-demand using ffmpeg seek.
StreamingAudioFileDataset Streaming version of AudioFileDataset that reads audio chunks on-demand.

Data Models

Data models for storing transcribed text and metadata.

AudioMetadata Data model for the metadata of an audio file.
SpeechSegment A slice of the audio file that contains speech of interest to be aligned.
WordSegment Word-level alignment data.
AlignmentSegment A segment of aligned audio and text.
AudioChunk Segment of audio, usually created by Voice Activity Detection (VAD).

Utilities

Utility functions for various tasks.

utils.hf_to_ct2_converter Convert a Hugging Face Transformers model to CTranslate2 format.