Function reference

Pipelines

Pipelines and functions for forced alignment, emission extraction, and VAD.

pipeline Complete pipeline to run VAD, extract emissions, and perform alignment.
vad_pipeline Run VAD on a list of audio files.
emissions_pipeline Run emissions extraction pipeline on the given audio files and save results to file.
alignment_pipeline Perform alignment on speech segments or VAD chunks using emissions.
vad_pipeline_generator Run VAD on a list of audio files.
emissions_pipeline_generator Run emissions extraction pipeline on the given audio files and save results to file.
alignment_pipeline_generator Perform alignment on speech segments or VAD chunks using emissions.

Text Processing

Text processing utilities. from easyaligner.text.normalization import function_name.

normalization.SpanMapNormalizer Apply regex text transformations while keeping track of the character spans
normalization.text_normalizer Default text normalization function.
tokenizer.load_tokenizer Loads a PunktTokenizer for the specified language that can be used to sentence tokenize text.

Datasets and I/O

Dataset classes for creating Pytorch DataLoaders, and reading JSON/Msgpack metadata. from easyaligner.data.dataset import ClassName

data.dataset.StreamingAudioFileDataset Streaming version of AudioFileDataset that reads audio chunks on-demand.
data.dataset.AudioFileDataset Loads audio files and corresponding metadata files. Splits the audio into chunks
data.dataset.JSONMetadataDataset Dataset for reading AudioMetadata JSON files.
data.dataset.MsgpackMetadataDataset Dataset for reading AudioMetadata Msgpack files.
data.utils.read_json Convenience function to read a JSON file and parse it into an AudioMetadata object.

Data Models

Data models for storing transcribed text and metadata. from easyaligner.data.datamodel import ClassName

AudioMetadata Data model for the metadata of an audio file.
SpeechSegment A slice of the audio file that contains speech of interest to be aligned.
WordSegment Word-level alignment data.
AlignmentSegment A segment of aligned audio and text.
AudioChunk Segment of audio, usually created by Voice Activity Detection (VAD).