Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CrateDB: Add support for SQLRecordManager [DOES NOT WORK] #18

Draft
wants to merge 17 commits into
base: cratedb
Choose a base branch
from

Conversation

amotl
Copy link

@amotl amotl commented Nov 21, 2023

About

It does not work, because this subsystem uses composite unique keys in combination with an ON CONFLICT DO UPDATE operation, on behalf of the model entity definition UpsertionRecord.

Because the composite uniqueness constraint is currently being emulated already, it can't also emulate ON CONFLICT behaviour on top easily.

Code References

__table_args__ = (
    UniqueConstraint("key", "namespace", name="uix_key_namespace"),
    Index("ix_key_namespace", "key", "namespace"),
)

stmt = insert_stmt.on_conflict_do_update(
    [UpsertionRecord.key, UpsertionRecord.namespace],
    ...
)

Error Message

CrateDB fails with SQLParseException[Number of conflict targets (["key", "namespace"]) did not match the number of primary key columns ([uuid])], which is expected.

sqlalchemy.exc.ProgrammingError: (crate.client.exceptions.ProgrammingError) SQLParseException[Number of conflict targets (["key", "namespace"]) did not match the number of primary key columns ([uuid])]
[SQL: INSERT INTO upsertion_record (uuid, key, namespace, group_id, updated_at) VALUES (?, ?, ?, ?, ?), (?, ?, ?, ?, ?), (?, ?, ?, ?, ?) ON CONFLICT (key, namespace) DO UPDATE SET group_id = excluded.group_id, updated_at = excluded.updated_at]
[parameters: ('c3c64eca-8c66-44ce-b4f4-01519249eeaf', 'key1', 'kittens', None, 1700599350.084, '9aa23596-fd40-4758-91eb-dac2431e09d1', 'key2', 'kittens', None, 1700599350.084, '3a06bab9-494f-4b3c-8bf5-6b66a155d5d6', 'key3', 'kittens', None, 1700599350.084)]
(Background on this error at: https://sqlalche.me/e/20/f405)

The implementation is based on the `pgvector` adapter, as both PostgreSQL and
CrateDB share similar attributes, and can be wrapped well by using the same
SQLAlchemy layer on top.
The implementation is based on the generic SQLAlchemy document loader.
The implementation is based on the generic `SQLChatMessageHistory`.
When not adding any embeddings upfront, the runtime model factory was
not able to derive the vector dimension size, because the SQLAlchemy
models have not been initialized correctly.
From now on, _all_ instances of SQLAlchemy model types will be created
at runtime through the `ModelFactory` utility.

By using `__table_args__ = {"keep_existing": True}` on the ORM entity
definitions, this seems to work well, even with multiple invocations
of `CrateDBVectorSearch.from_texts()` using different `collection_name`
argument values.

While being at it, this patch also fixes a few linter errors.
When deleting a collection, also delete its associated embeddings.
It is a special adapter which provides similarity search across multiple
collections. It can not be used for indexing documents.
The CrateDB adapter works a bit different compared to the pgvector
adapter it is building upon: Because the dimensionality of the vector
field needs to be specified at table creation time, but because it is
also a runtime parameter in LangChain, the table creation needs to be
delayed.

In some cases, the tables do not exist yet, but this is only relevant
for the case when the user requests to pre-delete the collection, using
the `pre_delete_collection` argument. So, do the error handling only
there instead, and _not_ on the generic data model utility functions.
It does not work, because this subsystem uses composite unique keys in
combination with an `ON CONFLICT DO UPDATE` operation, on behalf of the
model entity definition `UpsertionRecord`.


Because the composite uniqueness constraint is currently being emulated
already, it can't also emulate ON CONFLICT behaviour on top easily.

__table_args__ = (
    UniqueConstraint("key", "namespace", name="uix_key_namespace"),
    Index("ix_key_namespace", "key", "namespace"),
)

stmt = insert_stmt.on_conflict_do_update(
    [UpsertionRecord.key, UpsertionRecord.namespace],
    ...
)
@amotl amotl changed the title CrateDB: Add support for SQLRecordManager [DEFUNCT] CrateDB: Add support for SQLRecordManager [DOES NOT WORK] Nov 22, 2023
@amotl amotl added the wontfix This will not be worked on label Nov 22, 2023
@amotl amotl force-pushed the cratedb branch 5 times, most recently from 511f004 to dd64cd4 Compare December 1, 2023 09:37
@amotl amotl force-pushed the cratedb branch 2 times, most recently from 2379986 to 4e00dcb Compare December 7, 2023 20:56
@amotl amotl force-pushed the cratedb branch 3 times, most recently from 890bac3 to bf406c7 Compare January 18, 2024 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant