Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AstraDBGraphVectorStore testing #75

Merged
merged 19 commits into from
Sep 19, 2024
Merged

Add AstraDBGraphVectorStore testing #75

merged 19 commits into from
Sep 19, 2024

Conversation

hemidactylus
Copy link
Collaborator

Adding integration+unit test for AstraDBGraphVectorStore.

Doing so uncovered unexpected behaviour w.r.t the "id in document vs. ids passed as standalone" in the from_documents implementation for both AstraDBGraphVectorStore and AstraDBVectorStore, which this PR also fixes (and tests).

Incidentally the "filter to metadata" utility method was made public to allow the Graph layer to defer to its VectorStore delegate instead of reimplementing it.

This PR also addresses a comment to #67 that was unattended, inadvertently bringing a critical bug into main.

_documents: Iterable[Document]
if ids is not None:
_documents = [document.copy() for document in documents]
for _doc_id, _document in zip(ids, _documents):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be some check that ids and documents have the same size.

Copy link
Collaborator Author

@hemidactylus hemidactylus Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing since ids is going away from this method.

Side note for your amusement. This check would be not entirely trivial if one does not want to list(...) the iterables. I had started to implement something like this before removing it all.

But the real caveat is that zip(ite1, ite2) treats the two iterables differently (i.e. it's implemented straigthforwardly). It can consume one item too much from ite1 which goes nowhere. The first snippet here prints "4", the second one gives a StopIteration:

r1 = (i for i in range(4))
r2 = (i for i in range(5))

for p in zip(r1, r2):
   print(p)


next(r2)
r1 = (i for i in range(4))
r2 = (i for i in range(5))

for p in zip(r2, r1):
   print(p)


next(r2)

(almost obvious in hindsight, but still)

Copy link
Collaborator

@cbornet cbornet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@hemidactylus
Copy link
Collaborator Author

CI is having lots of troubles (hitting DB limits and whatnot; I'll have a PR to simplify, rationalize and optimize the tests and the way they use DB resources).
Meanwhile, I ran all tests locally, piecewise to avoid overloading the DB, and it all passes.
Merging now.

@hemidactylus hemidactylus merged commit 0d4220d into main Sep 19, 2024
11 of 13 checks passed
@hemidactylus hemidactylus deleted the SL-add-graph-tests branch September 19, 2024 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants