You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using qdrant-haystack 1.0.11 with farm-haystack==1.21.2 and python 3.10.13 on Win10 and Qdrant running in Docker.
When updating the embeddings of a document store, document_store.update_embeddings seems to update all embeddings even when update_existing_embeddings is set to False.
I'm running this code:
importtimeitfromhaystackimportDocumentfromhaystack.nodesimportEmbeddingRetrieverfromqdrant_haystack.document_storesimportQdrantDocumentStoredefupdate_embeddings(existing):
document_store.update_embeddings(retriever, update_existing_embeddings=existing)
document_store=QdrantDocumentStore(url="localhost", index="test_update_embeddings",
embedding_dim=512, similarity="cosine")
retriever=EmbeddingRetriever(document_store=document_store,
embedding_model="sentence-transformers/distiluse-base-multilingual-cased-v1",
use_gpu=False)
docs_to_index= [Document(content=str(i) +" random text"*100) foriinrange(0, 50)]
document_store.write_documents(docs_to_index, duplicate_documents="skip")
res_upd=timeit.timeit(stmt='update_embeddings(True)', globals=globals(), number=2)
res_noupd=timeit.timeit(stmt='update_embeddings(False)', globals=globals(), number=2)
print(f"Execution with update: {res_upd}, with no update: {res_noupd}")
After the execution the QDrant database contains 50 vectors, as expected.
I would also expect that update_embeddings(False) is running significantly faster than update_embeddings(True), but both statements run for nearly the same time: Execution with update: 22.15771689999383, with no update: 20.913242900016485
To me this looks like update_embeddings(..., update_existing_embeddings=False) is updating the embeddings, too.
What am I missing?
The text was updated successfully, but these errors were encountered:
Precondition: qdrant contains x documents and corresponding embeddings
Actions
Get n new documents
write n documents to qdrant
update only n new documents embeddings using update_embeddings
using update_embeddings does not work.
So a working use case would be
Precondition: qdrant contains x documents and corresponding embeddings
Actions
Get n new documents
create n new embeddings manually for all new documents
write n documents to qdrant (as write documents does not check the validity of the embeddings as far as I've understood).
So update_embeddings is basically useful only when I change the model generating the embeddings? This seems somehow a little bit against the intent of having a simple pipeline, at least to me.
I'm using qdrant-haystack 1.0.11 with farm-haystack==1.21.2 and python 3.10.13 on Win10 and Qdrant running in Docker.
When updating the embeddings of a document store, document_store.update_embeddings seems to update all embeddings even when update_existing_embeddings is set to False.
I'm running this code:
After the execution the QDrant database contains 50 vectors, as expected.
I would also expect that
update_embeddings(False)
is running significantly faster thanupdate_embeddings(True)
, but both statements run for nearly the same time:Execution with update: 22.15771689999383, with no update: 20.913242900016485
To me this looks like
update_embeddings(..., update_existing_embeddings=False)
is updating the embeddings, too.What am I missing?
The text was updated successfully, but these errors were encountered: