Apply regex text transformations while keeping track of the character spans in the original text.
Parameters
Name
Type
Description
Default
text
str
The input text to be normalized.
required
Example
from easyaligner.text.normalization import SpanMapNormalizertext ='''Book 1. Chapter 1, The Period. It was the best of times. It was the worst of times.It was the age of wisdom. It was the age of foolishness. It was the epoch of belief.It was the epoch of incredulity. It was the season of light.It was the season of darkness. It was the spring of hope.'''normalizer = SpanMapNormalizer(text)normalizer.transform(r"[^\w\s]", "") # Remove punctuation and special charactersnormalizer.transform(r"\S+", lambda m: m.group().lower()) # Lowercasenormalizer.transform(r"\s+", " ") # Normalize whitespace to a single spacenormalizer.transform(r"^\s+|\s+$", "") # Strip leading and trailing whitespaceprint(normalizer.current_text)print(normalizer.get_token_map())