vad_pipeline

pipelines.vad_pipeline(
    model,
    audio_paths,
    audio_dir=None,
    speeches=None,
    chunk_size=30,
    sample_rate=16000,
    metadata=None,
    batch_size=1,
    num_workers=1,
    prefetch_factor=2,
    save_json=True,
    save_msgpack=False,
    return_vad=False,
    output_dir='output/vad',
)

Run VAD on a list of audio files.

Parameters

Name Type Description Default
model object The loaded VAD model. required
audio_paths list List of paths to audio files. required
audio_dir str or None Directory where audio files/dirs are located (if audio_paths are relative). None
speeches list[list[SpeechSegment]] or None Optional list of SpeechSegment objects to run VAD and alignment only on specific segments of the audio. Alignment can generally be improved if VAD/alignment is only performed on the segments of the audio that overlap with text transcripts. None
chunk_size int The maximum length chunks VAD will create (seconds). 30
sample_rate int The sample rate to resample the audio to before running VAD. 16000
metadata list[dict] or None Optional list of additional file level metadata to include. None
batch_size int The batch size for the DataLoader. 1
num_workers int The number of workers for the DataLoader. 1
prefetch_factor int The prefetch factor for the DataLoader. 2
save_json bool Whether to save the VAD output as JSON files. True
save_msgpack bool Whether to save the VAD output as Msgpack files. False
return_vad bool Whether to return the VAD output. False
output_dir str Directory to save the JSON/Msgpack files if save_json/save_msgpack is True. "output/vad"

Returns

Name Type Description
list[AudioMetadata] or None If return_vad is True, returns a list of AudioMetadata objects for each audio file. Otherwise, returns None.