asr.ct2.transcribe

asr.ct2.transcribe(
    model,
    processor,
    file_dataloader,
    language=None,
    task='transcribe',
    batch_size=8,
    beam_size=5,
    patience=1.0,
    length_penalty=1.0,
    repetition_penalty=1.0,
    no_repeat_ngram_size=0,
    max_length=448,
    suppress_blank=True,
    num_workers=2,
    prefetch_factor=2,
    output_dir='output/transcriptions',
)

Transcribe audio files using CTranslate2 Whisper model.

This function processes audio files through a dataloader structure similar to the HuggingFace implementation, but uses ctranslate2 for inference.

Parameters

Name	Type	Description	Default
model	ctranslate2.models.Whisper	CTranslate2 Whisper model.	required
processor	transformers.WhisperProcessor	WhisperProcessor for tokenization and decoding.	required
file_dataloader	torch.utils.data.DataLoader	DataLoader yielding audio file datasets.	required
language	str	Language code (e.g., ‘sv’, ‘en’). If None, auto-detect.	`None`
batch_size	int	Batch size for feature processing.	`8`
task	str	Task type - ‘transcribe’ or ‘translate’.	`'transcribe'`
beam_size	int	Beam size for search. Default is 5.	`5`
patience	float	Beam search patience factor. Default is 1.0.	`1.0`
length_penalty	float	Length penalty for beam search. Default is 1.0.	`1.0`
repetition_penalty	float	Repetition penalty. Default is 1.0.	`1.0`
no_repeat_ngram_size	int	N-gram size for no repeat. Default is 0.	`0`
max_length	int	Maximum output length. Default is 448.	`448`
suppress_blank	bool	Whether to suppress blank tokens. Default is True.	`True`
num_workers	int	Number of workers for feature dataloader (file dataloader is created outside of this function).	`2`
prefetch_factor	int	Prefetch factor for feature dataloader (file dataloader is created outside of this function).	`2`
output_dir	str	Directory to save transcription JSON files. Default is `output/transcriptions`.	`'output/transcriptions'`