Tokenizer 1.26.0
New features
- Add
lang
tokenization option to apply language-specific case mappings
Fixes and improvements
- Use ICU to convert strings to Unicode values instead of a custom implementation
lang
tokenization option to apply language-specific case mappings