Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for OCI Data Science Embedding Models #2

Open
wants to merge 242 commits into
base: main
Choose a base branch
from

Conversation

mrDzurb
Copy link
Owner

@mrDzurb mrDzurb commented Dec 5, 2024

Description

Testing

Installation

  • Pull the repo
  • pip install -e llama-index-integrations/embeddings/llama-index-embeddings-oci-data-science
  • It will also require to install oracle-ads, llama-index

Usage

import ads
from llama_index.embeddings.oci_data_science import OCIDataScienceEmbeddings

ads.set_auth(auth="security_token", profile="<replace-with-your-profile>")

embedding = OCIDataScienceEmbeddings(
    endpoint="https://<MD_OCID>/predict",
)

e1 = embeddings.get_query_embedding("This is a test document")
print(e1)

e2 = embeddings.get_text_embedding("This is a test document")
print(e2)

e3 = embeddings.get_text_embedding_batch([
        "This is a test document",
        "This is another test document"
    ])
print(e3)

Async

import ads
from llama_index.embeddings.oci_data_science import OCIDataScienceEmbeddings

ads.set_auth(auth="security_token", profile="<replace-with-your-profile>")

embedding = OCIDataScienceEmbeddings(
    endpoint="https://<MD_OCID>/predict",
)

e1 = embeddings.aget_text_embedding("This is a test document")
print(e1)

e2 = await embeddings.aget_text_embedding_batch([
        "This is a test document",
        "This is another test document"
    ])
print(e2)

Notebook Examples

Archive.zip

@darenr darenr self-requested a review December 8, 2024 23:41
e1 = embeddings.get_query_embedding("This is a test document")
print(e1)

e2 = embeddings.get_text_embedding("This is a test document")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't really need the second identical example

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, removed one.

endpoint="https://<MD_OCID>/predict",
)

e1 = embeddings.aget_text_embedding("This is a test document")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this need an await?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed.

@mrDzurb mrDzurb requested a review from darenr December 9, 2024 04:57
Copy link
Collaborator

@VipulMascarenhas VipulMascarenhas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes look good overall, minor comment on the notebook.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

  • colab link point to bedrock notebook, replace with oci_data_science.ipynb.
  • replace "%pip install llama-index-embeddings-oci-data-science" with "!%pip install llama-index-embeddings-oci-data-science".

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks!
As for the ``%pip install llama-index-embeddings-oci-data-science``` i think the first one is the recommended way to install the package to the kernel without further reloading the kernel.

mrDzurb and others added 17 commits December 10, 2024 09:59
* refactor and optimize milvus code

Signed-off-by: ChengZi <[email protected]>

* Update pyproject.toml

---------

Signed-off-by: ChengZi <[email protected]>
Co-authored-by: Massimiliano Pippi <[email protected]>
* Add test_async_basic_flow()

* Make test_async_basic_flow() pass

* Make sync tests work again

* Make all tests pass

* Move client connect & close to fixture also in async tests

* Add adelete, adelete_nodes, aclear implementations

* Deduplicate code in query() / aquery()

* Get rid of clarified comment

This gets tested in `test_query_kwargs()`.

* Update WeaviateVectorStore documentation

* Remove from_params()

This method was not working as intended, it created a Weaviate v3 client instead of a v4 client as required by the rest of the code. It should be possible to just use the regular constructor instead of this method.

* Throw a custom exception when calling async methods without providing an async client

* Move AsyncClientNotProvidedError to a separate exceptions module

* Bump llama-index-vector-stores-weaviate version to 2.0.0

* Remove debug output

* DRY checks if _aclient is set, add SyncClientNotProvidedError

* Pass either sync or async client in `weaviate_client` parameter, fix connect()/close() when no weaviate client is provided

* Change new llama-index-vector-stores-weaviate version to 1.3.0 as the change is no longer breaking

* Downgrade to pytest 7 for compatibility with currently used pants configuration in CI

This commit can be reverted as soon as pytest >= 8 is used during the pants run.

* Reorganize test modules to prevent parallel execution during pants run

* Delete llama-index-integrations/vector_stores/llama-index-vector-stores-weaviate/poetry.lock

---------

Co-authored-by: Massimiliano Pippi <[email protected]>
…" out of "vector_store_kwargs" (run-llama#17221)

* parse "milvus_search_config" out of "vector_store_kwargs" passed to MilvusVectorStore.query

* MilvusVectorStore Query parses "milvus_search_config" out of "vector_store_kwargs"

* use kwargs dict.get instead of named index;

* use kwargs.get for milvus_search_config throughout class;

* Update pyproject.toml

* pass **kwargs to _default_search

* linting

* actual linting

---------

Co-authored-by: Massimiliano Pippi <[email protected]>
* rename resource fields

* refactor Document

* fix typing, bring back text_template for backward compat

* fix bug in keyval docstore

* make TextNode forward-compatible

* redo deprecations

* fix model identifier

* update mocks

* update mocks

* fix fixture check
LHFO94 and others added 30 commits January 27, 2025 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.