normalization.text_normalizer

text.normalization.text_normalizer(text)

Default text normalization function.

Applies - Unicode normalization (NFKC) - Lowercasing - Normalization of whitespace - Remove parentheses and special characters

Parameters

Name Type Description Default
text str Input text. required

Returns

Name Type Description
list of str List of normalized tokens.
list of dict Mapping between tokens and original text spans.