asr.hf.transcribe
asr.hf.transcribe(
model,
processor,
file_dataloader,
language=None,
task='transcribe',
batch_size=4,
beam_size=3,
length_penalty=1.0,
repetition_penalty=1.0,
max_length=250,
num_workers=2,
prefetch_factor=2,
output_dir='output/transcriptions',
device='cuda',
)Transcribe audio files using HuggingFace Whisper model.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| model | transformers.WhisperForConditionalGeneration | HuggingFace Whisper model. | required |
| processor | transformers.WhisperProcessor | HuggingFace Whisper processor. | required |
| file_dataloader | torch.utils.data.DataLoader | DataLoader yielding audio file datasets. | required |
| language | str | Language code (e.g., ‘sv’, ‘en’). Default is None (auto-detect). |
None |
| batch_size | int | Batch size for inference. | 4 |
| beam_size | int | Number of beams for beam search. Default is 3. | 3 |
| length_penalty | float | Length penalty. Default is 1.0. | 1.0 |
| repetition_penalty | float | Repetition penalty. Default is 1.0. | 1.0 |
| max_length | int | Maximum length of generated text. Default is 250. | 250 |
| num_workers | int | Number of workers for feature dataloader. | 2 |
| prefetch_factor | int | Prefetch factor for feature dataloader. | 2 |
| output_dir | str | Directory to save transcription JSON files. Default is output/transcriptions. |
'output/transcriptions' |
| device | str | Device to run inference on. Default is cuda. |
'cuda' |