Run emissions extraction pipeline on the given audio files and save results to file.
Parameters
Name
Type
Description
Default
model
object
The loaded ASR model.
required
processor
Wav2Vec2Processor
The processor to use for audio.
required
metadata
JSONMetadataDataset or list[AudioMetadata] or AudioMetadata
List of AudioMetadata objects or paths to JSON files.
required
audio_dir
str
Directory with audio files.
required
sample_rate
int
Sample rate to resample audio to.
16000
chunk_size
int
When alignment_strategy is set to speech, SpeechSegments are split into chunk_size sized chunks for feature extraction.
30
alignment_strategy
str
Strategy for aligning features to text. One of ‘speech’ or ‘chunk’. If speech, audio is split into chunk_size sized chunks based on SpeechSegments. If chunk, audio is taken from existing VAD chunks.
"speech"
batch_size_files
int
Batch size for the file DataLoader.
1
num_workers_files
int
Number of workers for the file DataLoader.
1
prefetch_factor_files
int
Prefetch factor for the file DataLoader.
2
batch_size_features
int
Batch size for the feature DataLoader.
8
num_workers_features
int
Number of workers for the feature DataLoader.
4
streaming
bool
Whether to use streaming audio files.
False
save_json
bool
Whether to save the emissions output as JSON files.
True
save_msgpack
bool
Whether to save the emissions output as Msgpack files.
False
save_emissions
bool
Whether to save the raw emissions as .npy files.
True
return_emissions
bool
Whether to return the emissions as a list of numpy arrays.
False
output_dir
str
Directory to save the output files if saving is enabled.
"output/emissions"
device
str
Device to run the model on (e.g. “cuda” or “cpu”).
"cuda"
Returns
Name
Type
Description
list[tuple(AudioMetadata, np.ndarray)] or None
If return_emissions is True, returns a list of tuples (metadata, emissions) for each audio file. Otherwise, returns None.