StreamingAudioSliceDataset

data.dataset.StreamingAudioSliceDataset(
    audio_path,
    chunk_specs,
    processor,
    sample_rate=16000,
    metadata=None,
    return_raw_audio=False,
)

Dataset that lazily loads audio chunks on-demand using ffmpeg seek.

Unlike AudioSliceDataset which holds all features in memory, this dataset stores only the chunk metadata and loads audio when getitem is called.

Parameters

Name	Type	Description	Default
audio_path	str or Path	Path to the audio file.	required
chunk_specs	list of dict	List of dicts with ‘start_sec’, ‘end_sec’, ‘speech_id’ keys.	required
processor	transformers.Wav2Vec2Processor or transformers.WhisperProcessor	Processor for feature extraction.	required
sample_rate	int	Target sample rate.	`16000`
metadata	AudioMetadata	AudioMetadata object to pass through.	`None`