Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data model: Create an individual embedding table per collection #12

Open
ckurze opened this issue Nov 20, 2023 · 2 comments
Open

Data model: Create an individual embedding table per collection #12

ckurze opened this issue Nov 20, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@ckurze
Copy link

ckurze commented Nov 20, 2023

Issue you'd like to raise.

Currently, all embeddings are stored in one single "embeddings" table. This has various downsides:

  • All embeddings need the same vector length
  • Indexes get very large
  • Hard/Complex to manage privileges on database side
  • Unnecessary join on potentially large amounts of data.

Suggestion:

CrateDB should not create a table "collections" and "emgeddings", but crate a new table that gets the name of the provided collection name in langchain which holds all embeddings.

@amotl
Copy link

amotl commented Nov 21, 2023

Hi there,

thank you for writing in, I agree with your proposal.

However, it will need a major rewrite of the data model and adapter, which is currently based on SQLAlchemy, so it may lack some flexibility to do just anything. With your request in mind, another patch has already been submitted to improve the situation in this regard.

We will see how the situation can be improved further.

With kind regards,
Andreas.

@amotl
Copy link

amotl commented Nov 22, 2023

Hi again,

there is a patch now that makes an attempt to improve the situation.

With kind regards,
Andreas.

@amotl amotl changed the title Issue: CrateDB Vector Store should create an individual table per "collection" defined in LangChain Data model: Create an individual embedding table per collection Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants