get_speech_features

data.dataset.AudioFileDataset.get_speech_features(
    audio_path,
    metadata,
    sr=16000,
)

Extract features for each speech segment in the metadata.

When alignment_strategy is speech, the speech segments are split into chunk_size sized chunks for wav2vec2 inference.

Parameters

Name Type Description Default
audio_path str Path to the audio file. required
metadata AudioMetadata Metadata object. required
sr int Sample rate. 16000

Returns

Name Type Description
list of dict List of dictionaries containing extracted features and metadata for each chunk.