Releases · LanguageMachines/ticcltools

01 Mar 10:46

kosloot

v0.10

be0b3b3

v0.10 Latest

Latest

[Ko van der Sloot]

LDcalc:
- No longer filter out n-grams with common parts. Was too aggressive
- Removed some more outcommented old code
chainclean: added a --caseless option. (Default is true)
Removed Roaring versions of the code. Lacked maintenance for years.
internally shifting towards UnicodeString in general
a lot of C++ cleanup, with some refactoring, splitting up long blobs of code

Assets 3

14 Sep 12:02

proycon

v0.9

f7d44e1

v0.9

Ko van der Sloot:

LDcalc: removed code to filter out ngrams with common parts (experimental)

Maarten van Gompel:

Added Dockerfile: containerization support
Changed repository status to unsupported!

Assets 2

15 Dec 14:44

kosloot

v0.8

82c8b29

v0.8

using more recent functions from ticcutils
use more code from ticcl_common
attempt to solve #42
some small code refactoring

Assets 3

15 Sep 11:11

proycon

v0.7.1

9b25e32

v0.7.1

[Ko vd Sloot]

changed ICU requirement to at least 5.6
some refactoring
started implementing a solution for #42
added error message when the index file is empty.

Assets 3

15 Apr 13:02

kosloot

v0.7

04b1148

v0.7

[Martin Reynaert]

updated man pages
updated README.md

[Ko vander Sloot]
Numerous bug fixes and additions. Added a .so for common functions

The bitType is changed to uint64_t (for the biggest int possible) which
triggered some code adaptations. (values < 0 are not possible)

TICCL-unk:
- some changes in UNK detection
- added a --hemp option
- create a .fore.clean file when a background corpus is merged in
TICCL-stats:
- added a -n option to use a newline as delimiter
TICCL-indexer(NT):
- better and faster implementation
- added --confstats option
TICCL-LDcalc:
- added a --follow option for debugging purposes
- fix for #30
- added --low and --high parameters
TICCL-rank:
- added a --follow option for debugging purposes
- added --subtractartifrqfeature1 and --subtractartifrqfeature2 options
- replaced pairs_combined ranking by median ranking
- added an n-garm filter
TICCL-chain:
- added --nounk option
- fix for #38
- fix for #37
- use the alphabet file too with --alph
TICCL-chainclean: new module to clean chain ranked files
TICCL-anahash:
- accept lexicons without frequencies too. (also simple word lists)
- added a -o option

Assets 3

05 Jun 10:49

kosloot

v0.6

1d5c4a4

v0.6

Intermediate release, with a lot of new code to handle N-grams
Also a lot of refactoring is done, for more clear and maintainable code.
This is work in progress still.

TICCL-unk:
- more extensive acronym detection
- fixed artifreq problems in 'clean' punctuated words
- added filters for 'unwanted' characters
- added a ligature filter to convert evil ligatures
- normalize all hyphens to a 'normal' one (-)
- use a better definition of punctuation (unicode character class is not
  good enough to decide)
TICCL-lexstat:
- the 'separator' symbol should get freq=0, so it isnt counted
- the clip value is added to the output filename
TICCL-indexer:
- indexer and indexerNT now produce the same output, using different
  strategies when a --foci files is used.
TICCL-LDcalc:
major overhaul for n-grams
- added a ngram point column to the output (so NOT backward compatible!)
- produce a '.short' list for short word corrections
- produce a '.ambi' file with a list of n-grams related to short words
- prune a lot of ngrams from the output
TICCL-rank:

output is sorted now
honor the ngram-points from the new LDcalc. (so NOT backward compatible!)

TICCL-chain: new module to chain ranked files
TICCL-lexclean:
-added a -x option for 'inverse' alphabet
TICCL-anahash:
- added a --list option to produce a list of words and anagram values
added metadata file: codemeta.json

Assets 3

19 Feb 14:55

kosloot

v0.5

63c3121

v0.5 Pre-release

Pre-release

updated configuration. also for Mac OSX
use of more ticcutils stuff: diacriticsfilter
added a TICCL-mergelex program
the OMP_THREAD_LIMIT environment variable was ignored sometimes
TICCL-unk:
- fixed a problem in artifreq handling
- changed acronym detection (work in progress)
- added -o option
  TICCL-lexstat:
- added TTR output
- added -o option
  TICCL-indexer
- now also handles --foci file. with some speed-up
- added a -t option
  TICCL-LDcalc:
- be less picky on a few wrong lines in the data
added some tests
when libroaring is installed we built roaring versions of some modules (experimental)
updated man pages

Assets 3

04 Apr 10:38

kosloot

v0.4

2cf7774

v0.4 Pre-release

Pre-release

first official release.
- added functions to test on Word2Vec datafiles
- refactoring and modernizing stuff all around

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: LanguageMachines/ticcltools

v0.10

v0.9

v0.8

v0.7.1

v0.7

v0.6

v0.5

v0.4