asr.hf.transcribe

asr.hf.transcribe(
    model,
    processor,
    file_dataloader,
    language=None,
    task='transcribe',
    batch_size=4,
    beam_size=3,
    length_penalty=1.0,
    repetition_penalty=1.0,
    max_length=250,
    num_workers=2,
    prefetch_factor=2,
    output_dir='output/transcriptions',
    device='cuda',
)

Transcribe audio files using HuggingFace Whisper model.

Parameters

Name Type Description Default
model transformers.WhisperForConditionalGeneration HuggingFace Whisper model. required
processor transformers.WhisperProcessor HuggingFace Whisper processor. required
file_dataloader torch.utils.data.DataLoader DataLoader yielding audio file datasets. required
language str Language code (e.g., ‘sv’, ‘en’). Default is None (auto-detect). None
batch_size int Batch size for inference. 4
beam_size int Number of beams for beam search. Default is 3. 3
length_penalty float Length penalty. Default is 1.0. 1.0
repetition_penalty float Repetition penalty. Default is 1.0. 1.0
max_length int Maximum length of generated text. Default is 250. 250
num_workers int Number of workers for feature dataloader. 2
prefetch_factor int Prefetch factor for feature dataloader. 2
output_dir str Directory to save transcription JSON files. Default is output/transcriptions. 'output/transcriptions'
device str Device to run inference on. Default is cuda. 'cuda'