Request for tokenizer #4

ruoyxue · 2024-12-21T15:36:04Z

Thanks for the perfect work!
I am working on a new English lip reading dataset, could you please publish the tokenizer model and token list in this work?

ahaliassos · 2024-12-21T15:46:09Z

Hi, you can have a look at https://github.com/ahaliassos/usr/blob/main/utils/labels/unigram1000_units.txt for the vocabulary and at https://github.com/ahaliassos/usr/blob/main/utils/utils.py#L6 for how the final token list is obtained (i.e., with the blank and eos tokens).

Hope this helps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for tokenizer #4

Request for tokenizer #4

ruoyxue commented Dec 21, 2024

ahaliassos commented Dec 21, 2024

Request for tokenizer #4

Request for tokenizer #4

Comments

ruoyxue commented Dec 21, 2024

ahaliassos commented Dec 21, 2024