asr.ct2.transcribe
asr.ct2.transcribe(
model,
processor,
file_dataloader,
language=None,
task='transcribe',
batch_size=8,
beam_size=5,
patience=1.0,
length_penalty=1.0,
repetition_penalty=1.0,
no_repeat_ngram_size=0,
max_length=448,
suppress_blank=True,
num_workers=2,
prefetch_factor=2,
output_dir='output/transcriptions',
)Transcribe audio files using CTranslate2 Whisper model.
This function processes audio files through a dataloader structure similar to the HuggingFace implementation, but uses ctranslate2 for inference.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| model | ctranslate2.models.Whisper | CTranslate2 Whisper model. | required |
| processor | transformers.WhisperProcessor | WhisperProcessor for tokenization and decoding. | required |
| file_dataloader | torch.utils.data.DataLoader | DataLoader yielding audio file datasets. | required |
| language | str | Language code (e.g., ‘sv’, ‘en’). If None, auto-detect. | None |
| batch_size | int | Batch size for feature processing. | 8 |
| task | str | Task type - ‘transcribe’ or ‘translate’. | 'transcribe' |
| beam_size | int | Beam size for search. Default is 5. | 5 |
| patience | float | Beam search patience factor. Default is 1.0. | 1.0 |
| length_penalty | float | Length penalty for beam search. Default is 1.0. | 1.0 |
| repetition_penalty | float | Repetition penalty. Default is 1.0. | 1.0 |
| no_repeat_ngram_size | int | N-gram size for no repeat. Default is 0. | 0 |
| max_length | int | Maximum output length. Default is 448. | 448 |
| suppress_blank | bool | Whether to suppress blank tokens. Default is True. | True |
| num_workers | int | Number of workers for feature dataloader (file dataloader is created outside of this function). | 2 |
| prefetch_factor | int | Prefetch factor for feature dataloader (file dataloader is created outside of this function). | 2 |
| output_dir | str | Directory to save transcription JSON files. Default is output/transcriptions. |
'output/transcriptions' |