emissions_pipeline_generator

pipelines.emissions_pipeline_generator(
    model,
    processor,
    metadata,
    audio_dir,
    sample_rate=16000,
    chunk_size=30,
    alignment_strategy='speech',
    batch_size_files=1,
    num_workers_files=1,
    prefetch_factor_files=2,
    batch_size_features=8,
    num_workers_features=4,
    streaming=True,
    save_json=True,
    save_msgpack=False,
    save_emissions=True,
    return_emissions=False,
    output_dir='output/emissions',
    device='cuda',
)

Run emissions extraction pipeline on the given audio files and save results to file.

If return_emissions is True, function becomes a generator that yields tuples of (metadata, emissions) for each audio file.

Parameters

Name Type Description Default
model object The loaded ASR model. required
processor Wav2Vec2Processor The processor to use for audio. required
metadata JSONMetadataDataset or list[AudioMetadata] or AudioMetadata List of AudioMetadata objects or paths to JSON files. required
audio_dir str Directory with audio files. required
sample_rate int Sample rate to resample audio to. 16000
chunk_size int When VAD is not used, SpeechSegments are naively split into chunk_size sized chunks for feature extraction. 30
alignment_strategy str Strategy for aligning features to text. One of ‘speech’ or ‘chunk’. If speech, audio is split into chunk_size sized chunks based on SpeechSegments. If chunk, audio is taken from existing VAD chunks. "speech"
batch_size_files int Batch size for the file DataLoader. 1
num_workers_files int Number of workers for the file DataLoader. 1
prefetch_factor_files int Prefetch factor for the file DataLoader. 2
batch_size_features int Batch size for the feature DataLoader. 8
num_workers_features int Number of workers for the feature DataLoader. 4
streaming bool Whether to use streaming audio files. False
save_json bool Whether to save the emissions output as JSON files. True
save_msgpack bool Whether to save the emissions output as Msgpack files. False
save_emissions bool Whether to save the raw emissions as .npy files. True
return_emissions bool Whether to return the emissions as a list of numpy arrays. False
output_dir str Directory to save the output files if saving is enabled. "output/emissions"
device str Device to run the model on (e.g. “cuda” or “cpu”). "cuda"

Yields

Name Type Description
tuple(AudioMetadata, np.ndarray) If return_emissions is True, yields tuples of (metadata, emissions) for each audio file.