get_speech_features
data.dataset.AudioFileDataset.get_speech_features(
audio_path,
metadata,
sr=16000,
)Extract features for each speech segment in the metadata.
When alignment_strategy is speech, the speech segments are split into chunk_size sized chunks for wav2vec2 inference.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| audio_path | str | Path to the audio file. | required |
| metadata | AudioMetadata | Metadata object. | required |
| sr | int | Sample rate. | 16000 |
Returns
| Name | Type | Description |
|---|---|---|
| list of dict | List of dictionaries containing extracted features and metadata for each chunk. |