Function reference
Transcription
Transcription pipelines and functions.
| pipelines.pipeline | Run the full transcription pipeline (VAD -> Transcribe -> Emissions -> Align). |
| asr.ct2.transcribe | Transcribe audio files using CTranslate2 Whisper model. |
| asr.hf.transcribe | Transcribe audio files using HuggingFace Whisper model. |
Text Processing
Text processing utilities. See also SpanMapNormalizer from easyaligner.
| text_normalizer | Default text normalization function. |
Datasets
Dataset classes for creating Pytorch DataLoaders.
| StreamingAudioSliceDataset | Dataset that lazily loads audio chunks on-demand using ffmpeg seek. |
| StreamingAudioFileDataset | Streaming version of AudioFileDataset that reads audio chunks on-demand. |
Data Models
Data models for storing transcribed text and metadata.
| AudioMetadata | Data model for the metadata of an audio file. |
| SpeechSegment | A slice of the audio file that contains speech of interest to be aligned. |
| WordSegment | Word-level alignment data. |
| AlignmentSegment | A segment of aligned audio and text. |
| AudioChunk | Segment of audio, usually created by Voice Activity Detection (VAD). |
Utilities
Utility functions for various tasks.
| utils.hf_to_ct2_converter | Convert a Hugging Face Transformers model to CTranslate2 format. |