Exception upon attempting to load a Tokenizer from file #566

joepalermo · 2020-12-16T21:12:39Z

Hi, I'm attempting to simply serialize and then unserialize a trained tokenizer. When I run the following code:

tokenizer = Tokenizer(BPE())
trainer = BpeTrainer(vocab_size=280)
tokenizer.train(trainer, ["preprocessing/corpus/corpus.txt"])
save_to_filepath = 'preprocessing/tokenizer.json'
tokenizer.save(save_to_filepath)
tokenizer = Tokenizer.from_file(save_to_filepath)

I get the following traceback:

Traceback (most recent call last):
...
    tokenizer = Tokenizer.from_file(save_to_filepath)
Exception: data did not match any variant of untagged enum ModelWrapper at line 1 column 5408

The text was updated successfully, but these errors were encountered:

n1t0 · 2021-01-06T15:50:45Z

Hi @joepalermo, would you mind sharing the resulting tokenizer.json file? It would be very helpful for us to debug this.

joepalermo · 2021-01-19T21:43:08Z

@n1t0 Thanks for your help.

GitHub isn't letting me attach a .json file to a comment, so I'll just paste the contents of it here:

{"version":"1.0","truncation":null,"padding":null,"added_tokens":[],"normalizer":null,"pre_tokenizer":null,"post_processor":null,"decoder":null,"model":{"dropout":null,"unk_token":null,"continuing_subword_prefix":null,"end_of_word_suffix":null,"fuse_unk":false,"vocab":{"\n":0," ":1,"(":2,")":3,"":4,"+":5,",":6,"-":7,".":8,"/":9,"0":10,"1":11,"2":12,"3":13,"4":14,"5":15,"6":16,"7":17,"8":18,"9":19,";":20,"=":21,"?":22,"C":23,"D":24,"F":25,"G":26,"I":27,"L":28,"S":29,"W":30,"a":31,"b":32,"c":33,"d":34,"e":35,"f":36,"g":37,"h":38,"i":39,"j":40,"k":41,"l":42,"m":43,"n":44,"o":45,"p":46,"q":47,"r":48,"s":49,"t":50,"u":51,"v":52,"w":53,"x":54,"y":55,"z":56," -":57,"e ":58,"t ":59," +":60," =":61," + ":62," - ":63,". ":64,";\n":65,"**":66,"Le":67,"Let ":68," = ":69,".;\n":70,"s ":71,"th":72," = -":73,"iv":74,"the ":75,"2":76,"r ":77,"of":78,". Let ":79,"d ":80,"?;\n":81,"at":82,"2":83,"of ":84,"3":85,"de":86,"or ":87,"4":88,"os":89,"pos":90,"(-":91,"5*":92,"Su":93,"ppos":94,"Suppos":95,"is ":96,"n ":97,"be ":98,"nd ":99,"co":100," a":101,"at ":102,"Wh":103,"What ":104,"ul":105," be ":106," - 1":107," + 1":108,"e -":109,"com":110,"3":111,"st ":112,") = ":113,"What is ":114,"ac":115,"act":116," f":117,"So":118,"lv":119,"Solv":120,"al":121,"ive ":122,") = -":123,"ate ":124,"mo":125,"commo":126,"common ":127,"in":128,"0":129,"Suppose ":130,"Cal":131,"cul":132,"Calcul":133,"Calculate ":134,"div":135,"divi":136," for ":137,"What is the ":138,"riv":139,"ative ":140,"deriv":141,"derivative ":142," and ":143,")/":144,"re":145,"or of ":146,"Is ":147,"). ":148,", ":149,"he":150,"im":151,"pr":152,"prim":153,"2 + ":154,"st common ":155,"fact":156,").;\n":157,"Suppose -":158,"Calculate the ":159," - 2":160,"6":161,"prime ":162," = 0":163," + 2":164,"Solve ":165,"2 - ":166,"or":167,", -":168,"derivative of ":169,"4":170,"10":171,"7":172,"ir":173,"y ":174,"r w":175,"d b":176,"ain":177,"main":178,"the prime ":179,"der w":180,"ded b":181,"is divi":182,"remain":183,"factor":184,"the prime factor":185,"der whe":186,"is divided b":187,"remainder whe":188,"the prime factors ":189,"12":190,"remainder when ":191,"the prime factors of ":192,"is divided by ":193,"min":194,"ti":195,"er":196," is divided by ":197,"Solve -":198,") be ":199,") be the ":200," w":201,"). Let ":202,"le ":203,"mul":204,"ple ":205," - 3":206,"tiple ":207,"multiple ":208,"rt ":209,"multiple of ":210,"8":211," + 3":212,"of -":213,"est common ":214,"11":215," a ":216," wrt ":217," - 2":218,"/2":219,". Suppose ":220," + 2":221,"(-2":222,". Is ":223,"9":224,". What is the ":225,"Fi":226,"Find ":227,"(-1":228,")?;\n":229," - 4":230,"/3":231,"derivative of -":232," + 4":233," - 3":234,"5":235,"eco":236,"seco":237,"second ":238," + 3":239,"0 = ":240,"0 = -":241,"Find the ":242," - -":243,"thir":244,"third ":245,"15":246,". Calculate the ":247,"13":248," + 4":249,"sor of ":250,"divisor of ":251," + -":252,"14":253," - 4*":254,"ghe":255,"hi":256,"ghest common ":257,"highest common ":258,". D":259,"no":260,"deno":261,"common deno":262,"minat":263,"common denominat":264,". Suppose -":265,"1*":266,"ar":267,"What ar":268,"What are ":269,"e?;\n":270,"16":271,"ber":272,"mber":273,"nu":274,"What are the prime factors of ":275,"mber?;\n":276,"number?;\n":277,"Li":278,"List ":279},"merges":[" -","e ","t "," +"," ="," + "," - ",". ","; \n","* ","L e","Le t "," = ",". ;\n","s ","t h"," = -","i v","th e ","2 ","r ","o f",". Let ","d ","? ;\n","a t"," 2","of ","3 ","d e","o r ","4 ","o s","p os","( -","5 ","S u","p pos","Su ppos","i s ","n ","b e ","n d ","c o"," a","a t ","W h","Wh at ","u l"," be "," - 1"," + 1","e -","co m"," 3","s t ",") = ","What is ","a c","ac t"," f","S o","l v","So lv","a l","iv e ",") = -","at e ","m o","com mo","commo n ","i n","0 ","Suppos e ","C al","c ul","Cal cul","Calcul ate ","d iv","div i"," f or ","What is the ","r iv","at ive ","de riv","deriv ative "," a nd ",") /","r e","or of ","I s ",") . ",", ","h e","i m","p r","pr im","2 + ","st common ","f act",") .;\n","Suppos e -","Calculate the "," - 2","6 ","prim e "," = 0"," + 2","Solv e ","2 - ","o r",", -","derivative of "," 4","1 0","7 ","i r","y ","r w","d b","a in","m ain","the prime ","de r w","de d b","is divi","re main","fact or","the prime factor","der w he","is divi ded b","remain der whe","the prime factor s ","1 2","remainder whe n ","the prime factors of ","is divided b y ","m in","t i","e r"," is divided by ","Solv e -",") be ",") be the "," w",") . Let ","l e ","m ul","p le "," - 3","ti ple ","mul tiple ","r t ","multiple of ","8 "," + 3","of -","e st common ","1 1"," a "," w rt "," - 2","/ 2",". Suppose "," + 2","(- 2",". Is ","9 ",". What is the ","F i","Fi nd ","(- 1",") ?;\n"," - 4","/ 3","derivative of -"," + 4"," - 3"," 5","e co","s eco","seco nd "," + 3","0 = ","0 = -","Find the "," - -","th ir","thir d ","1 5",". Calculate the ","1 3"," + 4","s or of ","divi sor of "," + -","1 4"," - 4","g he","h i","ghe st common ","hi ghest common ",". D","n o","de no","common deno","min at","common deno minat",". Suppose -","1 *","a r","What ar","What ar e ","e ?;\n","1 6","b er","m ber","n u","What are the prime factors of ","mber ?;\n","nu mber?;\n","L i","Li st "]}}

joepalermo · 2021-01-19T21:54:27Z

This is really confusing because I don't think I'm doing anything unusual.

Also note, I tried unpickling the tokenizer object and it gives a similar error: Exception: Error while attempting to unpickle Tokenizer: data did not match any variant of untagged enum ModelWrapper at line 1 column 5304

lukas-blecher · 2021-01-23T17:22:38Z

I've had the same issue. Try adding a pre_tokenizer:

from tokenizers.pre_tokenizers import Whitespace
tokenizer = Tokenizer(BPE())
tokenizer.pre_tokenizer = Whitespace()
trainer = BpeTrainer(vocab_size=280)
tokenizer.train(trainer, ["preprocessing/corpus/corpus.txt"])
save_to_filepath = 'preprocessing/tokenizer.json'
tokenizer.save(save_to_filepath)
tokenizer = Tokenizer.from_file(save_to_filepath)

Hustcw · 2021-04-15T07:49:10Z

any update to this problem? I've had the same issue

n1t0 · 2021-04-15T19:55:45Z

Have you tried the solution proposed by @lukas-blecher to use a pre-tokenizer?

I believe this issue is related to this one: #645

Hustcw · 2021-04-17T11:27:43Z

Have you tried the solution proposed by @lukas-blecher to use a pre-tokenizer?

I believe this issue is related to this one: #645

Yes, I've used a pre-tokenizer. I find this problem is caused by more than one spaces in tokenizer's merge mentioned in #645.

ejohb · 2021-11-30T20:06:20Z

Having same problem. I already have a pre-tokenizer added.

ejohb · 2021-12-01T14:02:33Z

Having same problem. I already have a pre-tokenizer added.

After some fiddling, the problem occurs only when I remove pre_tokenizers.Whitespace() and add pre_tokenizers.Split(pattern='\w+|[^\w\s]+', behavior='isolated') in its place.

ruitedk6 · 2022-01-18T13:03:50Z

In case this might be of help to others:
I was getting this error when using the SentenceTranformers library, and in my case upgrading tokenizers to version 0.10.3 fixed the issue:

pip install tokenizers==0.10.3

If anyone is getting this error, I recommend also taking a look at the dependency requirements (e.g., which version of the tokenizers libraries is required).

duskybomb · 2022-05-02T06:25:05Z

Yes, @ejohb is right. The problem occurs when using pre_tokenizers.Split() :/

Narsil · 2022-05-02T08:15:30Z

@duskybomb Does the problem still exist on latest 0.12.1 ? I can't seem to reproduce.

duskybomb · 2022-05-02T09:11:27Z

@Narsil yes, it is still there in 0.12.1. The errror when I was trying to load: Exception: data did not match any variant of untagged enum ModelWrapper at line 59999 column 3.
this is the pretokenizer i was using: tokenizer.pre_tokenizer = Split(pattern="<BREAK>", behavior="removed")

Also, I am not sure if this is desired or not -- but the vocab had <BREAK> merged with tokens despite using removed behavior.
eg: <BREAK>small<BREAK>, with small being the actual token.

Narsil · 2022-05-02T09:16:42Z

Do you have a simple reproducible script ?
here is the script I tried to use to reproduce, but it seems to be working properly

from tokenizers import trainers, models, Tokenizer, pre_tokenizers

tokenizer = Tokenizer(models.BPE())
trainer = trainers.BpeTrainer(
    special_tokens=["<unk>", "<pad>", "<sep>"],
    vocab_size=8000,
)
tokenizer.pre_tokenizer = pre_tokenizers.Split(pattern="\w+|[^\w\s]+", behavior="isolated")
tokenizer.add_special_tokens(["<sep>"])
tokenizer.add_tokens(["<sep>"])


def iterator_over_seqs():
    with open("data/big.txt", "r") as f:
        for line in f:
            yield "ABCEFGH"


tokenizer.train_from_iterator(iterator=iterator_over_seqs(), trainer=trainer)
tokenizer.save("tok.json", pretty=True)
encoded = tokenizer.encode("ABCD<sep>EFGH")
tok = Tokenizer.from_file("tok.json")  # This is what is supposed to fail no ? It doesn't here.
print(encoded.ids)
```

yechong316 · 2022-07-29T15:07:29Z

I also encountered the same problem, the json file is as follow， pleaseplease transform txt to json
tokenizer-wiki.txt

Narsil · 2022-08-01T09:27:30Z

hi @yechong316 ,

It seems your file contains merges which are not acceptable in the current deployed version of tokenizers.

Those merges contain multiple spaces: "e s " for instance (line 9499).
This should not be doable within the library, hence the limitation. So it's normal if you created the merges yourselves in some manner.

If it was done within the library, a reproducible script would be super helpful to reproduce and fix.
In general this is not a limitation of the underlying BPE model but really a self imposed limitation within the library. We can definitely lift this limitation off (Merges cannot handle tokens containing spaces. #909 if you want to try it out, but it will need rewriting the merges in a different way). It's not currently merged, as changing anything regarding serialization requires a great deal of care to make sure we're not breaking anything in a backward incompatible way. But if there's enough attention for this feature, it definitely can be added !

bashFish · 2022-11-09T09:59:55Z

just to complement on Narsil:
there are several "white space characters" usable in the tokenizer file, e.g. "Ġ" (unicode: ord("Ġ")=288) which in turn can be used in the merges

also, in case ye removed some of your vocab's, be sure all merges are still possible- in case some can't be resolved after altering it, it would throw the same error..

nihirv · 2022-11-22T20:14:10Z

Hi, I'm running into the same issue. However, I explicitly want to have multiple whitespaces in my merges. Could someone point me in the right direction on how I could do this?

davidgilbertson · 2023-03-07T00:49:11Z

This is still an issue in 0.13.2.

To reproduce:

from tokenizers import Tokenizer, models, trainers

bpe_model = models.BPE(unk_token="[UNK]")
tokenizer = Tokenizer(model=bpe_model)
tokenizer.train_from_iterator(
    iterator=["test~ing lick~ing kick~ing"],
    trainer=trainers.BpeTrainer(),
)

path = "my_tokenizer.json"
tokenizer.save(path)

tok_loaded = Tokenizer.from_file(path)

In this particular case, tokenizer.pre_tokenizer = Whitespace() is a workaround.

Narsil · 2023-03-07T08:43:49Z

Have you checked out the PR that fixes it ?
#909

Which not going to merge anytime soon since it changes the on-disk format of the tokenizer, so we need a compelling reason for going through the pain of making this change.

If any model that requires it gets merged into transformers for instance, that would be a very valid reason !

In the meantime, the PR should work.

ashutoshsaboo · 2023-03-07T16:17:29Z

Hi @Narsil : I think I've a very weird issue, which seems similar to the same above error stack trace in this issue. Here are the steps how it goes:

So I trained an instance of custom XLMRobertaFast tokenizer from scratch on my multi-lingual corpus. Point to note is that I trained it on transformers-4.26.0 version on a python 3.7 conda environment in a different EC2 instance. After I had trained this tokenizer, in a separate script I had loaded the same using XLMRobertaTokenizerFast.from_pretrained() and it had worked fine without any errors.
Now few days later, due to certain reasons I had to change my instance - I'm on a different instance that doesn't have python 3.7 and has python 3.6. So the latest version supported for python 3.6 is also transformers-4.18.0 which is installed on this instance. Now when I'm trying to load the same saved tokenizer which loaded perfectly with the 4.26.0 version as mentioned above, is failing now when loaded with the same function: XLMRobertaTokenizerFast.from_pretrained(). I tried it on transformers==4.2.1 to just double-check if it wasn't any bug in the 4.26.0 version or not. The error stack trace on both the tried transformers version on python 3.6 is as below:

Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 59 column 3

Is this expected? Are tokenizers supposed to be backwards incompatible across different transformer lib versions? Installing from scratch python 3.7 isn't trivial on this instance, hence request you to please help if anything can be done here as a workaround. While training the tokenizer I didn't do any extravagant - initialised a SentencePieceBPETokenizer() and just trained it from scratch by invoking .train() on my corpus.

Strangely the trained model on python 3.7 instance is loading perfectly on python 3.6 instance. So the issue is only with the tokenizer.

@Narsil request your help on this^. I can't post the same tokenizer here due to confidentiality reasons. But if you need any other info from me to help with this, please feel free to request right away.

Narsil · 2023-03-07T16:35:20Z

Can you check your tokenizers versions ? I think they are not the same major. (probably).

tokenizers is designed to be backwards compatible, but you're talking here about forward compatibility (some artefact created with a newer version working on an older version).

I can't tell you exactly what's going on, but the pre_tokenizer in the JSON file cannot be read by your older version. We did change the layout at some point, but again, in a backward compatbile fashion (older JSON are still read, but newer ones are written to disk).

It's probably not too hard to modify the 3.7 version to be loadable in your 3.6 environment. Just train a dummy model in the same fashion and look at how it's saved on disk in the old version. Can you do exactly the same thing ? I'm not sure it depends on your options you choose, and if they were only implemented later.

Have your tried using pyenv ? It's usually pretty good at installing different python versions on most systems (not sure it works in your case)

Does it make sense ?

If you happen to modify a JSON manaully, please double check the output of the tokenizer afterwards, it's easy to introduce subtle bugs without realizing.

ashutoshsaboo · 2023-03-07T17:01:40Z

woohoo editing the JSON worked! :D many thanks! @Narsil as a suggestion: should this forward compatibility changes across tokenizer versions be more specifically documented somewhere, so it's accessible easily?

FYI -- I just had to add "str_rep": "▁", in decoder as well as pre_tokenizer keys of the python 3.7 trained tokenizer.json to get it work on 3.6 version.

Narsil · 2023-03-07T17:29:15Z

should this forward compatibility changes across tokenizer versions be more specifically documented somewhere, so it's accessible easily?

There's a changelog + releases : https://github.com/huggingface/tokenizers/releases?page=2 Should be enough (but not necessarily easily discoverable.

Please triple check the output ids before claiming victory :)

ashutoshsaboo · 2023-03-07T18:22:30Z

Sorry what do you mean by output ids? Output ids of a tokenised sentence in python 3.7 instance and python 3.6 instance should assert to be equal - do you mean that?

…

On Tue, 7 Mar 2023 at 22:59, Nicolas Patry ***@***.***> wrote: should this forward compatibility changes across tokenizer versions be more specifically documented somewhere, so it's accessible easily? There's a changelog + releases : https://github.com/huggingface/tokenizers/releases?page=2 Should be enough (but not necessarily easily discoverable. Please triple check the output ids before claiming victory :) — Reply to this email directly, view it on GitHub <#566 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACLRHEMSKDXWZUJQNCRSM7LW25V7LANCNFSM4U6S3ODQ> . You are receiving this because you commented.Message ID: ***@***.***>

Narsil · 2023-03-08T10:06:52Z

I mean that the encodings are exactly the same on a larger enough subset of text. (tokenizer.encode(mystring))

delgermurun · 2023-07-16T09:32:03Z

I am having this problem. Here is the reproducible script:

from tokenizers.trainers import BpeTrainer
from tokenizers import Tokenizer
from tokenizers.models import BPE
from tokenizers.pre_tokenizers import Split

# https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt
t = """First Citizen:
Before we proceed any further, hear me speak.

..."""

tokenizer = Tokenizer(BPE(unk_token="[UNK]"))
trainer = BpeTrainer(special_tokens=["[UNK]"], vocab_size=1000, min_frequency=2)
tokenizer.pre_tokenizer = Split("\w+|[^\w\s]+", behavior="isolated")

tokenizer.train_from_iterator(
    iterator=[t],
    trainer=trainer,
)

tokenizer.save("tokenizer.json")

Works fine if I use trained tokenizer directly (not loading from the file)

print(tokenizer.encode("""especially       against Caius Marcius?

All:
Against""").tokens)

Output: ['es', 'p', 'ec', 'i', 'all', 'y ', ' ', ' ', ' ', ' ', ' ', ' a', 'gainst ', 'Caius Marc', 'i', 'us', '?\n\nAll:\n', 'A', 'gain', 'st']

But loading the tokenizer from the file fails.

tokenizer = Tokenizer.from_file("tokenizer.json")

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[88], line 1
----> 1 tokenizer = Tokenizer.from_file("tokenizer.json")

Exception: data did not match any variant of untagged enum ModelWrapper at line 382 column 3

Version: tokenizers==0.13.3

Narsil · 2023-07-17T19:53:24Z

Can you open a new issue please ?

It's not really good practice to resurrect old threads as it pollutes searches with potentially irrelevant content, and makes your issue which is likely a new bug less discoverable for others. (Ofc it's good to search beforehand to prevent duplicates, but when the thread is super old or closed, you can most likely create a new thread, and link the old one you found just it case we want to merge)

Narsil · 2023-07-17T20:47:25Z

Ok looked at this issue (I will copy it into a new issue once there's one).

The error is because of the current tokenizer format which expects the merges part of the file to not contain any space
There's a very old draft PR #909 that I made that can unlock that use case.

This wasn't implemented at the time, because changing the format is a pretty risky change for backward compatibility, and there didn't seem to be any real world use case.

mpjanus · 2023-09-21T12:13:41Z

I had the same error when loading LLama 2 models. Upgrading to transformers==4.33.2 and tokenizers==0.13.3 solved it for me.

github-actions · 2024-04-30T01:49:13Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

nihirv mentioned this issue Nov 22, 2022

Merges cannot handle tokens containing spaces. #909

Merged

delgermurun mentioned this issue Jul 16, 2023

BPE tokenization config corrupted without pre-tokenizer #645

Closed

david-waterworth mentioned this issue Sep 20, 2023

tokenizers.processors is not optional #1342

Closed

henrycharlesworth mentioned this issue Mar 21, 2024

Issue merging across whitespaces #1475

Closed

github-actions bot added the Stale label Apr 30, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 5, 2024

mcognetta mentioned this issue May 25, 2024

Deserializing BPE tokenizer failure #1541

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception upon attempting to load a Tokenizer from file #566

Exception upon attempting to load a Tokenizer from file #566

joepalermo commented Dec 16, 2020

n1t0 commented Jan 6, 2021

joepalermo commented Jan 19, 2021 •

edited

Loading

joepalermo commented Jan 19, 2021 •

edited

Loading

lukas-blecher commented Jan 23, 2021

Hustcw commented Apr 15, 2021

n1t0 commented Apr 15, 2021

Hustcw commented Apr 17, 2021

ejohb commented Nov 30, 2021

ejohb commented Dec 1, 2021

ruitedk6 commented Jan 18, 2022

duskybomb commented May 2, 2022

Narsil commented May 2, 2022

duskybomb commented May 2, 2022

Narsil commented May 2, 2022 •

edited

Loading

yechong316 commented Jul 29, 2022 •

edited

Loading

Narsil commented Aug 1, 2022

bashFish commented Nov 9, 2022

nihirv commented Nov 22, 2022

davidgilbertson commented Mar 7, 2023

Narsil commented Mar 7, 2023

ashutoshsaboo commented Mar 7, 2023 •

edited

Loading

Narsil commented Mar 7, 2023

ashutoshsaboo commented Mar 7, 2023 •

edited

Loading

Narsil commented Mar 7, 2023

ashutoshsaboo commented Mar 7, 2023 via email

Narsil commented Mar 8, 2023

delgermurun commented Jul 16, 2023 •

edited

Loading

Narsil commented Jul 17, 2023

Narsil commented Jul 17, 2023

mpjanus commented Sep 21, 2023

github-actions bot commented Apr 30, 2024

Exception upon attempting to load a Tokenizer from file #566

Exception upon attempting to load a Tokenizer from file #566

Comments

joepalermo commented Dec 16, 2020

n1t0 commented Jan 6, 2021

joepalermo commented Jan 19, 2021 • edited Loading

joepalermo commented Jan 19, 2021 • edited Loading

lukas-blecher commented Jan 23, 2021

Hustcw commented Apr 15, 2021

n1t0 commented Apr 15, 2021

Hustcw commented Apr 17, 2021

ejohb commented Nov 30, 2021

ejohb commented Dec 1, 2021

ruitedk6 commented Jan 18, 2022

duskybomb commented May 2, 2022

Narsil commented May 2, 2022

duskybomb commented May 2, 2022

Narsil commented May 2, 2022 • edited Loading

yechong316 commented Jul 29, 2022 • edited Loading

Narsil commented Aug 1, 2022

bashFish commented Nov 9, 2022

nihirv commented Nov 22, 2022

davidgilbertson commented Mar 7, 2023

Narsil commented Mar 7, 2023

ashutoshsaboo commented Mar 7, 2023 • edited Loading

Narsil commented Mar 7, 2023

ashutoshsaboo commented Mar 7, 2023 • edited Loading

Narsil commented Mar 7, 2023

ashutoshsaboo commented Mar 7, 2023 via email

Narsil commented Mar 8, 2023

delgermurun commented Jul 16, 2023 • edited Loading

But loading the tokenizer from the file fails.

Narsil commented Jul 17, 2023

Narsil commented Jul 17, 2023

mpjanus commented Sep 21, 2023

github-actions bot commented Apr 30, 2024

joepalermo commented Jan 19, 2021 •

edited

Loading

joepalermo commented Jan 19, 2021 •

edited

Loading

Narsil commented May 2, 2022 •

edited

Loading

yechong316 commented Jul 29, 2022 •

edited

Loading

ashutoshsaboo commented Mar 7, 2023 •

edited

Loading

ashutoshsaboo commented Mar 7, 2023 •

edited

Loading

delgermurun commented Jul 16, 2023 •

edited

Loading