Monotextor v1.1: Title in the Sky
released this
31 May 14:32
Apply Monofixer to document titles.
Detect sensitive data in paragraphs.
Compressed preverticals support.
New paragraph id format (prevertical2text).
Remove tabs, endlines and carriage return that generate additional lines or fields when normalization is disabled (Monofixer).
Detect Serbo-Croatian script (FastSpell).
Automatic installation of Hunspell dictionaries (FastSpell).
Python 3.10 compatibility
Check that Monocleaner model exists.
Snakemake always running everything despite no file changes.
Fix issue with encoding errors in sentence splitting making unexpected offsets in document metadata
Fix warning format when paragraph id > total paragraphs
Monotextor imports in bitextor_split
Correct names in stat files.
Group Serbo-Croatian under hbs
Better langid coverage for Icelandic (FastSpell).
Filter sentences by Monocleaner score and language id.
Remove hardcoded Monocleaner threshold.
Use pigz in rules that are parallelized.
Updated installation instructions.
Update Snakemake.
Update lxml.
You can’t perform that action at this time.