emissions_pipeline_generator
pipelines.emissions_pipeline_generator(
model,
processor,
metadata,
audio_dir,
sample_rate=16000,
chunk_size=30,
alignment_strategy='speech',
batch_size_files=1,
num_workers_files=1,
prefetch_factor_files=2,
batch_size_features=8,
num_workers_features=4,
streaming=True,
save_json=True,
save_msgpack=False,
save_emissions=True,
return_emissions=False,
output_dir='output/emissions',
device='cuda',
)Run emissions extraction pipeline on the given audio files and save results to file.
If return_emissions is True, function becomes a generator that yields tuples of (metadata, emissions) for each audio file.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| model | object | The loaded ASR model. | required |
| processor | Wav2Vec2Processor | The processor to use for audio. | required |
| metadata | JSONMetadataDataset or list[AudioMetadata] or AudioMetadata | List of AudioMetadata objects or paths to JSON files. | required |
| audio_dir | str | Directory with audio files. | required |
| sample_rate | int | Sample rate to resample audio to. | 16000 |
| chunk_size | int | When VAD is not used, SpeechSegments are naively split into chunk_size sized chunks for feature extraction. |
30 |
| alignment_strategy | str | Strategy for aligning features to text. One of ‘speech’ or ‘chunk’. If speech, audio is split into chunk_size sized chunks based on SpeechSegments. If chunk, audio is taken from existing VAD chunks. |
"speech" |
| batch_size_files | int | Batch size for the file DataLoader. | 1 |
| num_workers_files | int | Number of workers for the file DataLoader. | 1 |
| prefetch_factor_files | int | Prefetch factor for the file DataLoader. | 2 |
| batch_size_features | int | Batch size for the feature DataLoader. | 8 |
| num_workers_features | int | Number of workers for the feature DataLoader. | 4 |
| streaming | bool | Whether to use streaming audio files. | False |
| save_json | bool | Whether to save the emissions output as JSON files. | True |
| save_msgpack | bool | Whether to save the emissions output as Msgpack files. | False |
| save_emissions | bool | Whether to save the raw emissions as .npy files. | True |
| return_emissions | bool | Whether to return the emissions as a list of numpy arrays. | False |
| output_dir | str | Directory to save the output files if saving is enabled. | "output/emissions" |
| device | str | Device to run the model on (e.g. “cuda” or “cpu”). | "cuda" |
Yields
| Name | Type | Description |
|---|---|---|
| tuple(AudioMetadata, np.ndarray) | If return_emissions is True, yields tuples of (metadata, emissions) for each audio file. |