Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector Store: "Collection not found" when using pre_delete_collection=True #5

Closed
ckurze opened this issue Oct 17, 2023 · 7 comments
Closed
Assignees
Labels
bug Something isn't working question Further information is requested

Comments

@ckurze
Copy link

ckurze commented Oct 17, 2023

Problem

When using pre_delete_collection=True, there is only an error stating "Collection not found", the actual collection is not deleted / emptied.

Details

Example: vector_search.ipynb

COLLECTION_NAME = "state_of_the_union_test"

embeddings = OpenAIEmbeddings()

db = CrateDBVectorSearch.from_documents(
    embedding=embeddings,
    documents=docs,
    collection_name=COLLECTION_NAME,
    connection_string=CONNECTION_STRING,
    pre_delete_collection=True, 
)
@amotl
Copy link

amotl commented Oct 27, 2023

Hi Christian,

thanks for reporting. I've added a self-contained example program at 1, but I haven't been able to reproduce the "Collection not found" problem. I tried it with a CrateDB instance already running, and I also tried once more with a recycled one, without any existing tables.

Can I ask you to try again? Maybe the situation was improved in the meanwhile, and the flaw was resolved by some other fix added recently?

On the other hand, maybe my example program is still incomplete, and you would be able to complete it, in order to reproduce the problem?

With kind regards,
Andreas.

Footnotes

  1. https://gist.github.com/amotl/75f27244951f201b89db0d8394f97a0e

@amotl amotl added bug Something isn't working question Further information is requested labels Oct 27, 2023
@amotl
Copy link

amotl commented Oct 29, 2023

Indeed, I am also observing problems on the "Overwriting a vector store" section in vector_search.ipynb 1.

____ notebook: nbregression(vector_search) ____
nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
------------------
docs_with_score[0]
------------------

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[20], line 1
----> 1 docs_with_score[0]

IndexError: list index out of range
### Overwriting a vector store

If you have an existing collection, you can overwrite it by using `from_documents`,
and setting `pre_delete_collection = True`.
#%%
db = CrateDBVectorSearch.from_documents(
    documents=docs,
    embedding=embeddings,
    collection_name=COLLECTION_NAME,
    connection_string=CONNECTION_STRING,
    pre_delete_collection=True,
)
#%%
docs_with_score = db.similarity_search_with_score("foo")
#%%
docs_with_score[0]
#%% md

Footnotes

  1. https://github.com/crate/cratedb-examples/blob/amo/framework-langchain/framework/langchain/vector_search.ipynb?short_path=2b2353f#L399

@amotl amotl changed the title Vector Store: pre_delete_collection=True Vector Store: "Collection not found" when using pre_delete_collection=True Oct 29, 2023
@amotl
Copy link

amotl commented Oct 29, 2023

We may have been able to reproduce the flaw on behalf of bringing in corresponding software tests for the accompanying Jupyter Notebooks.

pytest -k "notebook and vector"

image

@amotl
Copy link

amotl commented Oct 29, 2023

When using pre_delete_collection=True, there is only an error stating "Collection not found".

Indeed, this is the only occurrance of logger.warning within pgvector. In this manner, it feels a bit like a stray log item, but C'est la vie.

$ ag "warning.*collection not found"

libs/langchain/langchain/vectorstores/pgembedding.py
219:                self.logger.warning("Collection not found")

libs/langchain/langchain/vectorstores/pgvector.py
189:                self.logger.warning("Collection not found")

[...] the actual collection is not deleted / emptied.

Will have to be investigated. Can you check again?

@amotl
Copy link

amotl commented Oct 30, 2023

[...] the actual collection is not deleted / emptied.

Will have to be investigated.

By using the standalone example program cratedb-langchain-pre-delete-collection.py, you can exercise that the Result count output is different when disabling the pre_delete_collection=True line.

You may need to invoke the program a few times with and without the line to see the difference. I guess this demonstrates it works well?

@amotl
Copy link

amotl commented Nov 20, 2023

@andnig just reported GH-11, which may be related to this one?

@amotl
Copy link

amotl commented Nov 21, 2023

Hi again. Unless there are any objections, let's consider this fixed?

@amotl amotl closed this as completed Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants