-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CrateDB vector: Refactor SQLAlchemy data model to provide two storage strategies #20
base: cratedb
Are you sure you want to change the base?
Commits on Dec 1, 2023
-
Configuration menu - View commit details
-
Copy full SHA for c0e260f - Browse repository at this point
Copy the full SHA c0e260fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 249580f - Browse repository at this point
Copy the full SHA 249580fView commit details -
CrateDB vector: Add vector store support
The implementation is based on the `pgvector` adapter, as both PostgreSQL and CrateDB share similar attributes, and can be wrapped well by using the same SQLAlchemy layer on top.
Configuration menu - View commit details
-
Copy full SHA for b752717 - Browse repository at this point
Copy the full SHA b752717View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6cab9b5 - Browse repository at this point
Copy the full SHA 6cab9b5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6444a46 - Browse repository at this point
Copy the full SHA 6444a46View commit details -
CrateDB loader: Add document loader support
The implementation is based on the generic SQLAlchemy document loader.
Configuration menu - View commit details
-
Copy full SHA for f494d64 - Browse repository at this point
Copy the full SHA f494d64View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5894310 - Browse repository at this point
Copy the full SHA 5894310View commit details -
CrateDB memory: Add conversational memory support
The implementation is based on the generic `SQLChatMessageHistory`.
Configuration menu - View commit details
-
Copy full SHA for 08f87b6 - Browse repository at this point
Copy the full SHA 08f87b6View commit details -
CrateDB vector: Fix usage when only reading, and not storing
When not adding any embeddings upfront, the runtime model factory was not able to derive the vector dimension size, because the SQLAlchemy models have not been initialized correctly.
Configuration menu - View commit details
-
Copy full SHA for 901fdcc - Browse repository at this point
Copy the full SHA 901fdccView commit details -
Configuration menu - View commit details
-
Copy full SHA for 33d81e3 - Browse repository at this point
Copy the full SHA 33d81e3View commit details -
CrateDB vector: Improve SQLAlchemy model factory
From now on, _all_ instances of SQLAlchemy model types will be created at runtime through the `ModelFactory` utility. By using `__table_args__ = {"keep_existing": True}` on the ORM entity definitions, this seems to work well, even with multiple invocations of `CrateDBVectorSearch.from_texts()` using different `collection_name` argument values. While being at it, this patch also fixes a few linter errors.
Configuration menu - View commit details
-
Copy full SHA for dfc9243 - Browse repository at this point
Copy the full SHA dfc9243View commit details -
CrateDB vector: Fix cascading deletes
When deleting a collection, also delete its associated embeddings.
Configuration menu - View commit details
-
Copy full SHA for 0e7f16b - Browse repository at this point
Copy the full SHA 0e7f16bView commit details -
CrateDB vector: Add CrateDBVectorSearchMultiCollection
It is a special adapter which provides similarity search across multiple collections. It can not be used for indexing documents.
Configuration menu - View commit details
-
Copy full SHA for e5c947c - Browse repository at this point
Copy the full SHA e5c947cView commit details -
CrateDB vector: Improve SQLAlchemy data model query utility functions
The CrateDB adapter works a bit different compared to the pgvector adapter it is building upon: Because the dimensionality of the vector field needs to be specified at table creation time, but because it is also a runtime parameter in LangChain, the table creation needs to be delayed. In some cases, the tables do not exist yet, but this is only relevant for the case when the user requests to pre-delete the collection, using the `pre_delete_collection` argument. So, do the error handling only there instead, and _not_ on the generic data model utility functions.
Configuration menu - View commit details
-
Copy full SHA for 2208963 - Browse repository at this point
Copy the full SHA 2208963View commit details -
Configuration menu - View commit details
-
Copy full SHA for d8429f7 - Browse repository at this point
Copy the full SHA d8429f7View commit details -
pgvector: Use SA's
bulk_save_objects
method for inserting embeddingsThe performance gains can be substantially.
Configuration menu - View commit details
-
Copy full SHA for 02cab14 - Browse repository at this point
Copy the full SHA 02cab14View commit details -
CrateDB vector: Test non-deterministic values by using pytest.approx
The test cases can be written substantially more elegant.
Configuration menu - View commit details
-
Copy full SHA for bcd304b - Browse repository at this point
Copy the full SHA bcd304bView commit details -
Configuration menu - View commit details
-
Copy full SHA for dd64cd4 - Browse repository at this point
Copy the full SHA dd64cd4View commit details -
CrateDB vector: Refactor SQLAlchemy data model to provide two strategies
- StorageStrategy.LANGCHAIN_PGVECTOR Reflects the vanilla way the pgvector adapter manages the data model: There is a single `collection` table and a single `embedding` table. - StorageStrategy.EMBEDDING_TABLE_PER_COLLECTION Reflects a more advanced way to manage the data model: There is a single `collection` table, and multiple `embedding` tables, one per collection. The default storage strategy is `LANGCHAIN_PGVECTOR`. To configure an alternative storage strategy, invoke this snippet before doing any other operations using `CrateDBVectorSearch`: CrateDBVectorSearch.configure( storage_strategy=StorageStrategy.EMBEDDING_TABLE_PER_COLLECTION )
Configuration menu - View commit details
-
Copy full SHA for 07ba7af - Browse repository at this point
Copy the full SHA 07ba7afView commit details