data.dataset.StreamingAudioFileDataset

data.dataset.StreamingAudioFileDataset(
    metadata,
    processor,
    audio_dir='data',
    sample_rate=16000,
    chunk_size=30,
    alignment_strategy='speech',
)

Streaming version of AudioFileDataset that reads audio chunks on-demand.

Instead of loading entire audio files and chunking in memory, this dataset returns a StreamingAudioSliceDataset that lazily loads each chunk via ffmpeg.

Parameters

Name Type Description Default
metadata JSONMetadataDataset or list of AudioMetadata or AudioMetadata List of AudioMetadata objects, JSONMetadataDataset, or single AudioMetadata. required
processor Wav2Vec2Processor or WhisperProcessor For feature extraction. required
audio_dir str Base directory for audio files. "data"
sample_rate int Target sample rate for resampling. 16000
chunk_size int Maximum chunk size in seconds (for speech-based chunking). 30
alignment_strategy str ‘speech’ or ‘chunk’ - determines how chunks are defined. "speech"