StreamingAudioSliceDataset

data.dataset.StreamingAudioSliceDataset(
    audio_path,
    chunk_specs,
    processor,
    sample_rate=16000,
    metadata=None,
)

Dataset that lazily loads audio chunks on-demand using ffmpeg seek.

Unlike AudioSliceDataset which holds all features in memory, this dataset stores only the chunk metadata and loads audio when getitem is called.

Parameters

Name Type Description Default
audio_path str or Path Path to the audio file. required
chunk_specs list of dict List of dicts with ‘start_sec’, ‘end_sec’, ‘speech_id’ keys. required
processor transformers.Wav2Vec2Processor or transformers.WhisperProcessor Processor for feature extraction. required
sample_rate int Target sample rate. 16000
metadata AudioMetadata AudioMetadata object to pass through. None