Function reference

Pipelines

Pipelines and functions for forced alignment, emission extraction, and VAD.

pipeline	Complete pipeline to run VAD, extract emissions, and perform alignment.
vad_pipeline	Run VAD on a list of audio files.
emissions_pipeline	Run emissions extraction pipeline on the given audio files and save results to file.
alignment_pipeline	Perform alignment on speech segments or VAD chunks using emissions.
vad_pipeline_generator	Run VAD on a list of audio files.
emissions_pipeline_generator	Run emissions extraction pipeline on the given audio files and save results to file.
alignment_pipeline_generator	Perform alignment on speech segments or VAD chunks using emissions.

Text processing utilities. from easyaligner.text.normalization import function_name.

normalization.SpanMapNormalizer	Apply regex text transformations while keeping track of the character spans
normalization.text_normalizer	Default text normalization function.
match.fuzzy_match	Fuzzy match between a needle (ground-truth text) and a haystack (ASR text).
tokenizer.load_tokenizer	Loads a PunktTokenizer for the specified language that can be used to sentence tokenize text.

Dataset classes for creating Pytorch DataLoaders, and reading JSON/Msgpack metadata. from easyaligner.data.dataset import ClassName

data.dataset.StreamingAudioFileDataset	Streaming version of AudioFileDataset that reads audio chunks on-demand.
data.dataset.AudioFileDataset	Loads audio files and corresponding metadata files. Splits the audio into chunks
data.dataset.JSONMetadataDataset	Dataset for reading AudioMetadata JSON files.
data.dataset.MsgpackMetadataDataset	Dataset for reading AudioMetadata Msgpack files.
data.utils.read_json	Convenience function to read a JSON file and parse it into an `AudioMetadata` object.

Data models for storing transcribed text and metadata. from easyaligner.data.datamodel import ClassName

AudioMetadata	Data model for the metadata of an audio file.
SpeechSegment	A slice of the audio file that contains speech of interest to be aligned.
WordSegment	Word-level alignment data.
AlignmentSegment	A segment of aligned audio and text.
AudioChunk	Segment of audio, usually created by Voice Activity Detection (VAD).
FuzzyMatch	Result of a fuzzy text match.