asr.ct2.transcribe

asr.ct2.transcribe(
    model,
    processor,
    file_dataloader,
    language=None,
    task='transcribe',
    batch_size=8,
    beam_size=5,
    patience=1.0,
    length_penalty=1.0,
    repetition_penalty=1.0,
    no_repeat_ngram_size=0,
    max_length=448,
    suppress_blank=True,
    num_workers=2,
    prefetch_factor=2,
    output_dir='output/transcriptions',
)

Transcribe audio files using CTranslate2 Whisper model.

This function processes audio files through a dataloader structure similar to the HuggingFace implementation, but uses ctranslate2 for inference.

Parameters

Name Type Description Default
model ctranslate2.models.Whisper CTranslate2 Whisper model. required
processor transformers.WhisperProcessor WhisperProcessor for tokenization and decoding. required
file_dataloader torch.utils.data.DataLoader DataLoader yielding audio file datasets. required
language str Language code (e.g., ‘sv’, ‘en’). If None, auto-detect. None
batch_size int Batch size for feature processing. 8
task str Task type - ‘transcribe’ or ‘translate’. 'transcribe'
beam_size int Beam size for search. Default is 5. 5
patience float Beam search patience factor. Default is 1.0. 1.0
length_penalty float Length penalty for beam search. Default is 1.0. 1.0
repetition_penalty float Repetition penalty. Default is 1.0. 1.0
no_repeat_ngram_size int N-gram size for no repeat. Default is 0. 0
max_length int Maximum output length. Default is 448. 448
suppress_blank bool Whether to suppress blank tokens. Default is True. True
num_workers int Number of workers for feature dataloader (file dataloader is created outside of this function). 2
prefetch_factor int Prefetch factor for feature dataloader (file dataloader is created outside of this function). 2
output_dir str Directory to save transcription JSON files. Default is output/transcriptions. 'output/transcriptions'