Directory where the audio files/dirs are located (if audio_paths are relative).
required
speeches
list[list[SpeechSegment]] or None
Optional list of SpeechSegment objects to run VAD only on specific segments of the audio. Alignment can generally be improved if VAD/alignment is only performed on the segments of the audio that overlap with text transcripts.
None
chunk_size
int
The maximum length chunks VAD will create (seconds).
30
sample_rate
int
The sample rate to resample the audio to before running VAD.
16000
metadata
list[dict] or None
Optional list of additional file level metadata to include.
None
batch_size
int
The batch size for the DataLoader.
1
num_workers
int
The number of workers for the DataLoader.
1
prefetch_factor
int
The prefetch factor for the DataLoader.
2
save_json
bool
Whether to save the VAD output as JSON files.
True
save_msgpack
bool
Whether to save the VAD output as Msgpack files.
False
return_vad
bool
Whether to yield the VAD output.
False
output_dir
str
Directory to save the VAD output files.
"output/vad"
Yields
Name
Type
Description
AudioMetadata
If return_vad is True, yields AudioMetadata objects for each audio file.