Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Couldn't instantiate the backend tokenizer from one of: (1) a tokenizers library serialization file, (2) a slow tokenizer instance to convert or (3) an equivalent slow tokenizer class to instantiate and convert. You need to have sentencepiece installed to convert a slow tokenizer to a fast one. #40

Closed
jameeldark2012 opened this issue May 24, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@jameeldark2012
Copy link

jameeldark2012 commented May 24, 2023

Hi, I was trying to launch the server for the first time "api_inference_server.py" and I got this error :

Detecting GPU...
Initilizing model....
Loading language model...
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ D:\Python\AIwaifu\./api_inference_server.py:35 in <module>                                       │
│                                                                                                  │
│    32                                                                                            │
│    33 print("Initilizing model....")                                                             │
│    34 print("Loading language model...")                                                         │
│ ❱  35 tokenizer = AutoTokenizer.from_pretrained("PygmalionAI/pygmalion-1.3b", use_fast=True)     │
│    36 config = AutoConfig.from_pretrained("PygmalionAI/pygmalion-1.3b", is_decoder=True)         │
│    37 model = AutoModelForCausalLM.from_pretrained("PygmalionAI/pygmalion-1.3b", config=config   │
│    38                                                                                            │
│                                                                                                  │
│ D:\Python\test\.venv\lib\site-packages\transformers\models\auto\tokenization_auto.py:679 in      │
│ from_pretrained                                                                                  │
│                                                                                                  │
│   676 │   │   │   │   raise ValueError(                                                          │
│   677 │   │   │   │   │   f"Tokenizer class {tokenizer_class_candidate} does not exist or is n   │
│   678 │   │   │   │   )                                                                          │
│ ❱ 679 │   │   │   return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *input   │
│   680 │   │                                                                                      │
│   681 │   │   # Otherwise we have to be creative.                                                │
│   682 │   │   # if model is an encoder decoder, the encoder tokenizer class is used by default   │
│                                                                                                  │
│ D:\Python\test\.venv\lib\site-packages\transformers\tokenization_utils_base.py:1804 in           │
│ from_pretrained                                                                                  │
│                                                                                                  │
│   1801 │   │   │   else:                                                                         │
│   1802 │   │   │   │   logger.info(f"loading file {file_path} from cache at {resolved_vocab_fil  │
│   1803 │   │                                                                                     │
│ ❱ 1804 │   │   return cls._from_pretrained(                                                      │
│   1805 │   │   │   resolved_vocab_files,                                                         │
│   1806 │   │   │   pretrained_model_name_or_path,                                                │
│   1807 │   │   │   init_configuration,                                                           │
│                                                                                                  │
│ D:\Python\test\.venv\lib\site-packages\transformers\tokenization_utils_base.py:1958 in           │
│ _from_pretrained                                                                                 │
│                                                                                                  │
│   1955 │   │                                                                                     │
│   1956 │   │   # Instantiate tokenizer.                                                          │
│   1957 │   │   try:                                                                              │
│ ❱ 1958 │   │   │   tokenizer = cls(*init_inputs, **init_kwargs)                                  │
│   1959 │   │   except OSError:                                                                   │
│   1960 │   │   │   raise OSError(                                                                │
│   1961 │   │   │   │   "Unable to load vocabulary from file. "                                   │
│                                                                                                  │
│ D:\Python\test\.venv\lib\site-packages\transformers\models\gpt_neox\tokenization_gpt_neox_fast.p │
│ y:111 in __init__                                                                                │
│                                                                                                  │
│   108 │   │   add_prefix_space=False,                                                            │
│   109 │   │   **kwargs,                                                                          │
│   110 │   ):                                                                                     │
│ ❱ 111 │   │   super().__init__(                                                                  │
│   112 │   │   │   vocab_file,                                                                    │
│   113 │   │   │   merges_file,                                                                   │
│   114 │   │   │   tokenizer_file=tokenizer_file,                                                 │
│                                                                                                  │
│ D:\Python\test\.venv\lib\site-packages\transformers\tokenization_utils_fast.py:120 in __init__   │
│                                                                                                  │
│   117 │   │   │   slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs)                    │
│   118 │   │   │   fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)                        │
│   119 │   │   else:                                                                              │
│ ❱ 120 │   │   │   raise ValueError(                                                              │
│   121 │   │   │   │   "Couldn't instantiate the backend tokenizer from one of: \n"               │
│   122 │   │   │   │   "(1) a `tokenizers` library serialization file, \n"                        │
│   123 │   │   │   │   "(2) a slow tokenizer instance to convert or \n"                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a `tokenizers` library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

@HRNPH
Copy link
Owner

HRNPH commented May 24, 2023

ValueError: Couldn't instantiate the backend tokenizer from one of:
(1) a `tokenizers` library serialization file,
(2) a slow tokenizer instance to convert or
(3) an equivalent slow tokenizer class to instantiate and convert.
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

This is due to the wrong configuration of mine that didn't include sentencepice in a requirements.txt
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.
you can solve this really fast by

pip install  sentencepiece

inside the terminal, if this still doesn't solve the issue please notice me!

@jameeldark2012
Copy link
Author

Server and client started successfully, thanks.

@HRNPH HRNPH added the bug Something isn't working label May 24, 2023
@HRNPH HRNPH self-assigned this May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants