Loads audio files and corresponding metadata files. Splits the audio into chunks according to metadata, and creates wav2vec2 features for each chunk. Returns an AudioSliceDataset object containing the features for each chunk, along with the metadata.
Parameters
Name
Type
Description
Default
metadata
JSONMetadataDataset or list of AudioMetadata or AudioMetadata
List of AudioMetadata objects or paths to JSON files.
required
processor
Wav2Vec2Processor or WhisperProcessor
The Wav2vec2Processor to use for feature extraction.
required
audio_dir
str
Directory with audio files
"data"
sample_rate
int
Sample rate to resample audio to.
16000
chunk_size
int
When VAD is not used, SpeechSegments are naively split into chunk_size sized chunks for feature extraction.
30
alignment_strategy
str
‘speech’ or ‘chunk’ - determines how chunks are defined.