Skip to content
@daac-tools

daac-tools

Pinned Loading

  1. daachorse daachorse Public

    🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure in Rust.

    Rust 202 14

  2. vaporetto vaporetto Public

    🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer

    Rust 230 9

  3. crawdad crawdad Public

    🦞 Rust library of natural language dictionaries using character-wise double-array tries.

    Rust 28 2

  4. vibrato vibrato Public

    🎤 vibrato: Viterbi-based accelerated tokenizer

    Rust 336 14

  5. rucrf rucrf Public

    Conditional Random Fields implemented in pure Rust

    Rust 8 2

  6. trie-match trie-match Public

    Fast match expression optimized for string comparison

    Rust 38

Repositories

Showing 10 of 13 repositories
  • vaporetto Public

    🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer

    daac-tools/vaporetto’s past year of commit activity
    Rust 230 Apache-2.0 9 1 1 Updated Nov 10, 2024
  • daachorse Public

    🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure in Rust.

    daac-tools/daachorse’s past year of commit activity
    Rust 202 Apache-2.0 14 1 2 Updated Oct 12, 2024
  • vibrato Public

    🎤 vibrato: Viterbi-based accelerated tokenizer

    daac-tools/vibrato’s past year of commit activity
    Rust 336 Apache-2.0 14 6 0 Updated Sep 23, 2024
  • python-vaporetto Public

    🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.

    daac-tools/python-vaporetto’s past year of commit activity
    Rust 20 Apache-2.0 1 0 0 Updated Sep 4, 2024
  • python-vibrato Public

    Viterbi-based accelerated tokenizer (Python wrapper)

    daac-tools/python-vibrato’s past year of commit activity
    Rust 40 Apache-2.0 1 0 0 Updated Sep 4, 2024
  • trie-match Public

    Fast match expression optimized for string comparison

    daac-tools/trie-match’s past year of commit activity
    Rust 38 Apache-2.0 0 0 0 Updated Jan 29, 2024
  • vaporetto-models Public

    Tokenization models and training scripts for Vaporetto fast tokenizer

    daac-tools/vaporetto-models’s past year of commit activity
    Rust 1 Apache-2.0 0 0 0 Updated May 30, 2023
  • crawdad Public

    🦞 Rust library of natural language dictionaries using character-wise double-array tries.

    daac-tools/crawdad’s past year of commit activity
    Rust 28 Apache-2.0 2 0 0 Updated Feb 20, 2023
  • include-bytes-zstd Public

    Includes a file with zstd compression in Rust

    daac-tools/include-bytes-zstd’s past year of commit activity
    Rust 12 Apache-2.0 0 0 0 Updated Feb 17, 2023
  • guidelines Public

    Guidelines for daac-tools community

    daac-tools/guidelines’s past year of commit activity
    0 0 0 0 Updated Feb 16, 2023

Top languages

Loading…

Most used topics

Loading…