emissions_pipeline

pipelines.emissions_pipeline(
    model,
    processor,
    metadata,
    audio_dir,
    sample_rate=16000,
    chunk_size=30,
    alignment_strategy='speech',
    num_workers_files=1,
    prefetch_factor_files=2,
    batch_size_features=8,
    num_workers_features=4,
    streaming=True,
    save_json=True,
    save_msgpack=False,
    save_emissions=True,
    return_emissions=False,
    output_dir='output/emissions',
    device='cuda',
)

Run emissions extraction pipeline on the given audio files and save results to file.

Parameters

Name	Type	Description	Default
model	object	The loaded ASR model.	required
processor	Wav2Vec2Processor	The processor to use for audio.	required
metadata	JSONMetadataDataset or list[AudioMetadata] or AudioMetadata	List of AudioMetadata objects or paths to JSON files.	required
audio_dir	str	Directory with audio files.	required
sample_rate	int	Sample rate to resample audio to.	`16000`
chunk_size	int	When `alignment_strategy` is set to `speech`, SpeechSegments are split into `chunk_size` sized chunks for feature extraction.	`30`
alignment_strategy	str	Strategy for aligning features to text. One of ‘speech’ or ‘chunk’. If `speech`, audio is split into `chunk_size` sized chunks based on SpeechSegments. If `chunk`, audio is taken from existing VAD chunks.	`"speech"`
num_workers_files	int	Number of workers for the file DataLoader.	`1`
prefetch_factor_files	int	Prefetch factor for the file DataLoader.	`2`
batch_size_features	int	Batch size for the feature DataLoader.	`8`
num_workers_features	int	Number of workers for the feature DataLoader.	`4`
streaming	bool	Whether to use streaming audio files.	`False`
save_json	bool	Whether to save the emissions output as JSON files.	`True`
save_msgpack	bool	Whether to save the emissions output as Msgpack files.	`False`
save_emissions	bool	Whether to save the raw emissions as .npy files.	`True`
return_emissions	bool	Whether to return the emissions as a list of numpy arrays.	`False`
output_dir	str	Directory to save the output files if saving is enabled.	`"output/emissions"`
device	str	Device to run the model on (e.g. “cuda” or “cpu”).	`"cuda"`

Returns

Name	Type	Description
	list[tuple(AudioMetadata, np.ndarray)] or None	If `return_emissions` is True, returns a list of tuples (metadata, emissions) for each audio file. Otherwise, returns None.