Function reference
Pipelines
Pipelines and functions for forced alignment, emission extraction, and VAD.
| pipeline | Complete pipeline to run VAD, extract emissions, and perform alignment. |
| vad_pipeline | Run VAD on a list of audio files. |
| emissions_pipeline | Run emissions extraction pipeline on the given audio files and save results to file. |
| alignment_pipeline | Perform alignment on speech segments or VAD chunks using emissions. |
| vad_pipeline_generator | Run VAD on a list of audio files. |
| emissions_pipeline_generator | Run emissions extraction pipeline on the given audio files and save results to file. |
| alignment_pipeline_generator | Perform alignment on speech segments or VAD chunks using emissions. |
Text Processing
Text processing utilities. from easyaligner.text.normalization import function_name.
| normalization.SpanMapNormalizer | Apply regex text transformations while keeping track of the character spans |
| normalization.text_normalizer | Default text normalization function. |
| tokenizer.load_tokenizer | Loads a PunktTokenizer for the specified language that can be used to sentence tokenize text. |
Datasets and I/O
Dataset classes for creating Pytorch DataLoaders, and reading JSON/Msgpack metadata. from easyaligner.data.dataset import ClassName
| data.dataset.StreamingAudioFileDataset | Streaming version of AudioFileDataset that reads audio chunks on-demand. |
| data.dataset.AudioFileDataset | Loads audio files and corresponding metadata files. Splits the audio into chunks |
| data.dataset.JSONMetadataDataset | Dataset for reading AudioMetadata JSON files. |
| data.dataset.MsgpackMetadataDataset | Dataset for reading AudioMetadata Msgpack files. |
| data.utils.read_json | Convenience function to read a JSON file and parse it into an AudioMetadata object. |
Data Models
Data models for storing transcribed text and metadata. from easyaligner.data.datamodel import ClassName
| AudioMetadata | Data model for the metadata of an audio file. |
| SpeechSegment | A slice of the audio file that contains speech of interest to be aligned. |
| WordSegment | Word-level alignment data. |
| AlignmentSegment | A segment of aligned audio and text. |
| AudioChunk | Segment of audio, usually created by Voice Activity Detection (VAD). |