Releases · bitextor/monotextor

Added

Apply Monofixer to document titles.
Detect sensitive data in paragraphs.
Compressed preverticals support.
New paragraph id format (prevertical2text).
Remove tabs, endlines and carriage return that generate additional lines or fields when normalization is disabled (Monofixer).
Detect Serbo-Croatian script (FastSpell).
Automatic installation of Hunspell dictionaries (FastSpell).

Python 3.10 compatibility
Check that Monocleaner model exists.
Snakemake always running everything despite no file changes.
Fix issue with encoding errors in sentence splitting making unexpected offsets in document metadata
Fix warning format when paragraph id > total paragraphs
Monotextor imports in bitextor_split
Correct names in stat files.