Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4_embeddings: ValueError: "InstructorEmbeddings" object has no field "_model" #6

Open
jonmach opened this issue Dec 8, 2023 · 3 comments

Comments

@jonmach
Copy link

jonmach commented Dec 8, 2023

I"m working through the llama_docs_bot files and there is an issue with the InstructorEmbeddings class that relies on BaseEmbedding:

Running the following:

# set the batch size to 1 to avoid memory issues
# if you have a large GPU, you can increase this
instructor_embeddings = InstructorEmbeddings(embed_batch_size=1)

I get the following error:


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[/Users/jon/dev/LLM/LLaMaIndex/llama_docs_bot/4_embeddings/4_embeddings.ipynb](https://file+.vscode-resource.vscode-cdn.net/Users/jon/dev/LLM/LLaMaIndex/llama_docs_bot/4_embeddings/4_embeddings.ipynb) Cell 10 line 3
      [1](vscode-notebook-cell:/Users/jon/dev/LLM/LLaMaIndex/llama_docs_bot/4_embeddings/4_embeddings.ipynb#X12sZmlsZQ%3D%3D?line=0) # set the batch size to 1 to avoid memory issues
      [2](vscode-notebook-cell:/Users/jon/dev/LLM/LLaMaIndex/llama_docs_bot/4_embeddings/4_embeddings.ipynb#X12sZmlsZQ%3D%3D?line=1) # if you have a large GPU, you can increase this
----> [3](vscode-notebook-cell:/Users/jon/dev/LLM/LLaMaIndex/llama_docs_bot/4_embeddings/4_embeddings.ipynb#X12sZmlsZQ%3D%3D?line=2) instructor_embeddings = InstructorEmbeddings(embed_batch_size=1)

[/Users/jon/dev/LLM/LLaMaIndex/llama_docs_bot/4_embeddings/4_embeddings.ipynb](https://file+.vscode-resource.vscode-cdn.net/Users/jon/dev/LLM/LLaMaIndex/llama_docs_bot/4_embeddings/4_embeddings.ipynb) Cell 10 line 1
      [6](vscode-notebook-cell:/Users/jon/dev/LLM/LLaMaIndex/llama_docs_bot/4_embeddings/4_embeddings.ipynb#X12sZmlsZQ%3D%3D?line=5) def __init__(
      [7](vscode-notebook-cell:/Users/jon/dev/LLM/LLaMaIndex/llama_docs_bot/4_embeddings/4_embeddings.ipynb#X12sZmlsZQ%3D%3D?line=6)     self, 
      [8](vscode-notebook-cell:/Users/jon/dev/LLM/LLaMaIndex/llama_docs_bot/4_embeddings/4_embeddings.ipynb#X12sZmlsZQ%3D%3D?line=7)     instructor_model_name: str = "hkunlp/instructor-large",
      [9](vscode-notebook-cell:/Users/jon/dev/LLM/LLaMaIndex/llama_docs_bot/4_embeddings/4_embeddings.ipynb#X12sZmlsZQ%3D%3D?line=8)     instruction: str = "Represent the Computer Science text for retrieval:",
     [10](vscode-notebook-cell:/Users/jon/dev/LLM/LLaMaIndex/llama_docs_bot/4_embeddings/4_embeddings.ipynb#X12sZmlsZQ%3D%3D?line=9)     **kwargs: Any,
     [11](vscode-notebook-cell:/Users/jon/dev/LLM/LLaMaIndex/llama_docs_bot/4_embeddings/4_embeddings.ipynb#X12sZmlsZQ%3D%3D?line=10) ) -> None:
---> [12](vscode-notebook-cell:/Users/jon/dev/LLM/LLaMaIndex/llama_docs_bot/4_embeddings/4_embeddings.ipynb#X12sZmlsZQ%3D%3D?line=11)     self._model = INSTRUCTOR(instructor_model_name)
     [13](vscode-notebook-cell:/Users/jon/dev/LLM/LLaMaIndex/llama_docs_bot/4_embeddings/4_embeddings.ipynb#X12sZmlsZQ%3D%3D?line=12)     self._instruction = instruction
     [14](vscode-notebook-cell:/Users/jon/dev/LLM/LLaMaIndex/llama_docs_bot/4_embeddings/4_embeddings.ipynb#X12sZmlsZQ%3D%3D?line=13)     super().__init__(**kwargs)

File [/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pydantic/main.py:357](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pydantic/main.py:357), in pydantic.main.BaseModel.__setattr__()

ValueError: "InstructorEmbeddings" object has no field "_model"

This is a list of my installed modules with versions etc.


Package                     Version
--------------------------- ------------
aiohttp                     3.9.1
aiosignal                   1.3.1
aiostream                   0.5.2
alembic                     1.13.0
altair                      5.2.0
annotated-types             0.6.0
anyio                       3.7.1
appdirs                     1.4.4
appnope                     0.1.3
argon2-cffi                 23.1.0
argon2-cffi-bindings        21.2.0
arrow                       1.3.0
asttokens                   2.4.1
async-lru                   2.0.4
attrs                       23.1.0
Babel                       2.13.1
backoff                     2.2.1
beautifulsoup4              4.12.2
bleach                      6.1.0
blinker                     1.7.0
cachetools                  5.3.2
certifi                     2023.11.17
cffi                        1.16.0
charset-normalizer          3.3.2
click                       8.1.7
cohere                      4.37
comm                        0.2.0
contourpy                   1.2.0
cycler                      0.12.1
dataclasses-json            0.6.3
datasets                    2.15.0
debugpy                     1.8.0
decorator                   5.1.1
defusedxml                  0.7.1
Deprecated                  1.2.14
dill                        0.3.7
distro                      1.8.0
dnspython                   2.4.2
entrypoints                 0.4
executing                   2.0.1
Faker                       20.1.0
fastavro                    1.9.0
fastjsonschema              2.19.0
favicon                     0.7.0
filelock                    3.13.1
fonttools                   4.46.0
fqdn                        1.5.1
frozendict                  2.3.10
frozenlist                  1.4.0
fsspec                      2023.10.0
gitdb                       4.0.11
GitPython                   3.1.40
greenlet                    3.0.1
h11                         0.14.0
htbuilder                   0.6.2
html2text                   2020.1.16
httpcore                    1.0.2
httpx                       0.25.2
huggingface-hub             0.19.4
humanize                    4.9.0
idna                        3.6
importlib-metadata          6.11.0
InstructorEmbedding         1.0.0
ipykernel                   6.27.1
ipython                     8.18.1
ipywidgets                  8.1.1
isoduration                 20.11.0
jedi                        0.19.1
Jinja2                      3.1.2
joblib                      1.3.2
json5                       0.9.14
jsonpatch                   1.33
jsonpointer                 2.4
jsonschema                  4.20.0
jsonschema-specifications   2023.11.2
jupyter                     1.0.0
jupyter_client              8.6.0
jupyter-console             6.6.3
jupyter_core                5.5.0
jupyter-events              0.9.0
jupyter-lsp                 2.2.1
jupyter_server              2.11.2
jupyter_server_terminals    0.4.4
jupyterlab                  4.0.9
jupyterlab_pygments         0.3.0
jupyterlab_server           2.25.2
jupyterlab-widgets          3.0.9
kaggle                      1.5.16
kiwisolver                  1.4.5
langchain                   0.0.348
langchain-core              0.0.12
langsmith                   0.0.69
litellm                     1.11.1
llama-index                 0.9.13
loguru                      0.7.2
lxml                        4.9.3
Mako                        1.3.0
Markdown                    3.5.1
markdown-it-py              3.0.0
markdownlit                 0.0.7
MarkupSafe                  2.1.3
marshmallow                 3.20.1
matplotlib                  3.8.2
matplotlib-inline           0.1.6
mdurl                       0.1.2
merkle-json                 1.0.0
millify                     0.1.1
mistune                     3.0.2
more-itertools              10.1.0
mpmath                      1.3.0
multidict                   6.0.4
multiprocess                0.70.15
munch                       4.0.0
mypy-extensions             1.0.0
nbclient                    0.9.0
nbconvert                   7.12.0
nbformat                    5.9.2
nest-asyncio                1.5.8
networkx                    3.2.1
nltk                        3.8.1
notebook                    7.0.6
notebook_shim               0.2.3
numpy                       1.26.2
openai                      1.3.7
overrides                   7.4.0
packaging                   23.2
pandas                      2.1.3
pandocfilters               1.5.0
parso                       0.8.3
pexpect                     4.9.0
Pillow                      10.1.0
pinecone-client             2.2.4
pip                         23.3.1
platformdirs                4.1.0
prometheus-client           0.19.0
prompt-toolkit              3.0.41
protobuf                    4.25.1
psutil                      5.9.6
ptyprocess                  0.7.0
pure-eval                   0.2.2
pyarrow                     14.0.1
pyarrow-hotfix              0.6
pycparser                   2.21
pydantic                    1.10.13
pydantic_core               2.14.5
pydeck                      0.8.1b0
Pygments                    2.17.2
pymdown-extensions          10.5
pyparsing                   3.1.1
pypdf                       3.17.1
python-dateutil             2.8.2
python-decouple             3.8
python-dotenv               1.0.0
python-json-logger          2.0.7
python-slugify              8.0.1
pytz                        2023.3.post1
PyYAML                      6.0.1
pyzmq                       25.1.1
qtconsole                   5.5.1
QtPy                        2.4.1
referencing                 0.31.1
regex                       2023.10.3
requests                    2.31.0
rfc3339-validator           0.1.4
rfc3986-validator           0.1.1
rich                        13.7.0
rpds-py                     0.13.2
safetensors                 0.4.1
scikit-learn                1.3.2
scipy                       1.11.4
Send2Trash                  1.8.2
sentence-transformers       2.2.2
sentencepiece               0.1.99
setuptools                  65.5.0
six                         1.16.0
slack-bolt                  1.18.1
slack-sdk                   3.26.1
smmap                       5.0.1
sniffio                     1.3.0
soupsieve                   2.5
SQLAlchemy                  2.0.23
st-annotated-text           4.0.1
stack-data                  0.6.3
streamlit                   1.29.0
streamlit-aggrid            0.3.4.post3
streamlit-camera-input-live 0.2.0
streamlit-card              0.0.61
streamlit-embedcode         0.1.2
streamlit-extras            0.3.5
streamlit-faker             0.0.3
streamlit-image-coordinates 0.1.6
streamlit-javascript        0.1.5
streamlit-keyup             0.2.0
streamlit-toggle-switch     1.0.2
streamlit-vertical-slider   1.0.2
sympy                       1.12
tenacity                    8.2.3
terminado                   0.18.0
text-unidecode              1.3
threadpoolctl               3.2.0
tiktoken                    0.5.2
tinycss2                    1.2.1
tokenizers                  0.15.0
toml                        0.10.2
toolz                       0.12.0
torch                       2.1.1
torchvision                 0.16.1
tornado                     6.4
tqdm                        4.66.1
traitlets                   5.14.0
transformers                4.35.2
trulens-eval                0.18.2
types-python-dateutil       2.8.19.14
typing_extensions           4.5.0
typing-inspect              0.8.0
tzdata                      2023.3
tzlocal                     5.2
uri-template                1.3.0
urllib3                     1.26.18
validators                  0.22.0
wcwidth                     0.2.12
webcolors                   1.13
webencodings                0.5.1
websocket-client            1.7.0
widgetsnbextension          4.0.9
wrapt                       1.16.0
xxhash                      3.4.1
yarl                        1.9.3
you-get                     0.4.1650
zipp                        3.17.0
@jonmach
Copy link
Author

jonmach commented Dec 8, 2023

Problem resolved by adding:

    _model: INSTRUCTOR = PrivateAttr()
    _instruction: str = PrivateAttr()
    

to the InstructorEmbeddings class

@Omegapy
Copy link

Omegapy commented Dec 30, 2023

I also got the error:

ValueError: "InstructorEmbeddings" object has no field "_model"

My solution

This is my fix:

class InstructorEmbeddings(BaseEmbedding):
    
    _instruction: str = "Represent the Computer Science text for retrieval:"
     
    def __init__(
        self, 
        instructor_model_name: str = "hkunlp/instructor-large",
        **kwargs: Any,
    ) -> None:
        _model: INSTRUCTOR = INSTRUCTOR(instructor_model_name)
        super().__init__(**kwargs)

    def _get_query_embedding(self, query: str) -> List[float]:
        embeddings = model.encode([[self._instruction, query]])
        return embeddings[0].tolist()
    
    async def _aget_query_embedding(self, query: str) -> List[float]:
        return self._get_query_embedding(query)

    def _get_text_embedding(self, text: str) -> List[float]:
        embeddings = model.encode([[self._instruction, text]])
        return embeddings[0].tolist() 
    
    def _get_text_embeddings(self, texts: List[str]) -> List[List[float]]:
        embeddings = model.encode([[self._instruction, text] for text in texts])
        return embeddings.tolist()

My results are the same as the video:

embed = instructor_embeddings.get_text_embedding("How do I create a vector index?")
print(len(embed))
print(embed[:10])
768
[0.003987060859799385, 0.012122981250286102, 0.002690523862838745, 0.01581709273159504, -0.005555964540690184, 0.03673827275633812, 0.010727009736001492, 0.00666137645021081, -0.0392913892865181, 0.013146855868399143]

@mvitas
Copy link

mvitas commented Jul 23, 2024

Current working solution as of Jul 23rd 2024.

from typing import Any, List
from InstructorEmbedding import INSTRUCTOR
from llama_index.core.embeddings import BaseEmbedding
**from pydantic import Extra**

class InstructorEmbeddings(BaseEmbedding):
    
    class Config:
        extra = Extra.allow

    _instruction: str = "Represent the Computer Science text for retrieval:"

    def __init__(
        self, 
        instructor_model_name: str = "hkunlp/instructor-large",
        **kwargs: Any
    ) -> None:
        super().__init__(**kwargs)
        self._model: INSTRUCTOR = INSTRUCTOR(instructor_model_name)

    def _get_query_embedding(self, query: str) -> List[float]:
        embeddings = self._model.encode([[self._instruction, query]])
        return embeddings[0].tolist()
    
    async def _aget_query_embedding(self, query: str) -> List[float]:
        return self._get_query_embedding(query)

    def _get_text_embedding(self, text: str) -> List[float]:
        embeddings = self._model.encode([[self._instruction, text]])
        return embeddings[0].tolist() 
    
    def _get_text_embeddings(self, texts: List[str]) -> List[List[float]]:
        embeddings = self._model.encode([[self._instruction, text] for text in texts])
        return embeddings.tolist()

Adding instance var model with pydantic validation in place

BaseEmbedding class has pydantic validation, meaning that no extra fields can be added to InstructorEmbeddings child class out of the box.

Add following code to allow extra fields to be defined.

class Config: extra = Extra.allow

Initialize BaseEmbedding super class before initializing model

def __init__(
        self, 
        instructor_model_name: str = "hkunlp/instructor-large",
        **kwargs: Any
    ) -> None:
        super().__init__(**kwargs) # placement of this line is important
        self._model: INSTRUCTOR = INSTRUCTOR(instructor_model_name)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants