Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Optimizing Text Embedding Processor #1138

Open
will-hwang opened this issue Jan 23, 2025 · 3 comments
Open

[RFC] Optimizing Text Embedding Processor #1138

will-hwang opened this issue Jan 23, 2025 · 3 comments
Assignees

Comments

@will-hwang
Copy link
Contributor

will-hwang commented Jan 23, 2025

Optimizing Text Embedding Processor

Problem Statement

Proposal: #793

Text Embedding Processor is a processor that is defined as an input to the ingest pipeline to create vector embeddings from text. In current state, text embedding processor makes model inference calls for every document ingestion or update. While this approach is necessary for generating embeddings during initial document ingestion, it's unnecessary to regenerate them during document update if the embedding-related fields remain unchanged. This inefficiency can lead to unnecessary cost increase for customers and computational overhead for model inference. This document discusses the design to achieving optimization of the Text Embedding Processor in Neural Search.

Requirement

  • Enable user to add a settingto configure Text Embedding Processor to call or skip model inferencing when appropriate
  • Skip call to model inference when flag is enabled and the field for embeddings has not changed

Out of Scope

  • Skip the check for model ID change between initial ingestion of the document and the update document action due to complexity involved.
  • Exclude the feature in AOSS due to complexity involved with different indexing methodology

Current State

Text Embedding Processor Configuration

Currently, Text Embedding Processor expects two fields:

model_id: A model to be used for creating vector embeddings
field_map: specifies the name of the field from which to take the text (text) and the name of the field in which to record embeddings (passage_embedding)

{
  "description": "An NLP ingest pipeline",
  "processors": [
    {
      "text_embedding": {
        "model_id": "aVeif4oB5Vm0Tdw8zYO2",
        "field_map": {
          "text": "passage_embedding"
        }
      }
    }
  ]
}

Reference: link

Current Flows

In current flow, there is no difference between ingestion/update of the document. The embeddings are created every time a document is ingested and/or updated in the Text Embedding Processor. See below for different use case scenarios:

Current Scenario 1: Single Document Update with embedding field

Image

Steps:

  1. User ingests/updates Doc1
  2. Text Embedding Processor invokes model inference for text via MLCommonClientAccessor
  3. ML Common returns embeddings for text in Doc1
  4. Text Embedding Processor populates vector embeddings for the ingested/updated Doc1

Current Scenario 2: Single Document Update without embedding field

Image

Steps:

  1. User updates Doc1 without text field
  2. Text Embedding Processor skips model inference because text field does not exist
  3. Doc1 is updated with only the fields defined in the request, removing the existing text and embedding fields. In this update scenario, irrelevant_field is updated from 1 to 2 and text and embedding fields are removed.

Current Scenario 3: Single Document Update with both embedding and vector embedded fields

Image

Steps:

  1. User updates Doc1 with vector embedded field
  2. Text Embedding Processor invokes model inference because text field exists
  3. ML Common returns embeddings for text in Doc1
  4. Doc1 is updated with embeddings generated from ML Commons, not with embeddings passed in from the user.

Current Scenario 4: Single Document Update with only vector embedded field

Image

Steps:

  1. User updates Doc1 with just vector embedded field
  2. Text Embedding Processor skips model inference because text field does not exist
  3. Doc1 is updated with only the fields defined in the request (embedding field), removing the existing text field and overwriting the existing embedding field.

Proposed State

Proposed Text Embedding Processor Configuration

An optional flag ignore_unaltered will be supported as an input to text_embedding. If the flag is defined and set to false, Text Embedding Processor will attempt to skip eligible inference text. If the flag is not defined or set to true, Text Embedding Processor will behave as is today which will always make a call to model, without requiring a check on document state.

{
    "description": "An NLP ingest pipeline",
    "processors": [
        {
            "text_embedding": {
                "model_id": "aVeif4oB5Vm0Tdw8zYO2",
                "field_map": {
                    "text": "passage_embedding"
                },
                "ignore_unaltered": "true"/"false" // optional field that can be defined by user**
            }
        }
    ]
}

Alternative is to define the flag at the cluster level, which will enable/disable the optimization at the cluster level. Since this is a flag that pertains specifically to text embedding processor, the proposed configuration keeps the flag as part of parameter in text embedding processor only. If, in the future, other processors implement features for optimization, a general optimization flag can be set at the cluster level.

Proposed Flows

Initial Document Ingestion Flow

Document ingestion flow will not change. [Refer to Appendix 1.1 for prior proposal]

Update Document Flow

The updated flow will involve the existing OpenSearchClient defined through client in Neural Search. The client will serve to interact with APIs offered in OpenSearch Core. For this use case, the client will be used to fetch the already ingested Documents.

Proposed Scenario 1: Single Document Update with single embedding field

Image

Steps:

  1. User updates Doc1, with changes to irrelevant_field.
  2. Text Embedding Processor fetches the existing Doc1 via OpenSearchClient
    1. If Doc1 does not exist, Text Embedding Processor creates embeddings and index the Doc.
  3. Text Embedding Processor checks:
    1. embedding exists in the returned Doc
    2. inference text has not changed on update
  4. If all checks pass, Text Embedding Processor skips call to create embeddings.
    1. If any of the checks fail, Text Embedding Processor will invoke ML Common’s inference API to create the embeddings
  5. Doc is updated with only changes to irrelevant_field, leaving the other fields the same.

Proposed Scenario 2: Single Document Update with multiple embedding fields

Image

Steps:

  1. User updates Doc1, with changes to text_2 field, an additional field with mapped embeddings.
  2. Text Embedding Processor will fetch the existing Doc1 via OpenSearchClient
    1. If Doc1 does not exist, Text Embedding Processor creates embeddings and index the Doc.
  3. Text Embedding Processor checks for both fields text_1 and text_2:
    1. embedding exists in the returned Doc
    2. inference text has not changed on update
  4. Text Embedding Processor acknowledges a change has made only in text_2 field
  5. Inference call is made for text_2 field, because its value has changed
  6. Inference call is skipped for text_1 field, because its value has remained unchanged.
  7. Doc is updated with only changes to text_2 and embedding_2 fields*,* leaving the text_1 and embedding_1 fields the same.

Proposed Scenario 3: Single Document Update without embedding field

No change to existing behavior. Model Inference will be skipped regardless of the feature, because the text field is missing - Check Current Flow - Current Scenario 2 for expected flow

User Scenario 4: Single Document Update with vector embedded field

Image

Steps:

  1. User updates Doc1, including vector embedded field
  2. Text Embedding Processor fetches the existing Doc1 via OpenSearchClient
    1. If Doc1 does not exist, Text Embedding Processor creates embeddings and index the Doc.
  3. Text Embedding Processor will check:
    1. embedding exists in the returned Doc
    2. inference text has not changed on update
  4. If all checks pass, Text Embedding Processor skips call to create embeddings. In this scenario, both checks pass because text field has not changed, and embedding field is present in the existing Doc1
    1. If any of the checks fail, Text Embedding Processor will invoke ML Common’s inference API to create the embeddings
  5. Doc1 is not updated because the checks have passed. The manually passed in embedding field is not reflected in the update.

User Scenario 4: Single Document Update with only vector embedded field

No change to existing behavior. Model Inference will be skipped regardless of the feature, because the text field is missing.

Check Current Flow - User Scenario 4 for expected flow

User Scenario 5: Batch Document Update

Image

Steps:

  1. User updates a batch of documents, with changes made to a partial number of documents.
  2. Text Embedding Processor will fetch the each of the existing document via OpenSearchClient
    1. If a document does not exist, Text Embedding Processor will create embeddings and index the Doc.
  3. Text Embedding Processor will check for each document:
    1. embedding exists in the returned Doc
    2. inference text has not changed on update
  4. a. If a document passes the checks in step 3, inference call to ML Commons is skipped
    b. If a document does not pass the checks in step 3, inference call to ML Commons is made
  5. Only the documents with changes to text field are updated with new inference embeddings

Summary

The outcome of the proposed change can be summarized as follows:

  1. If ignore_unaltered is defined and set to true, Text Embedding Processor will make a call to fetch the ingested Document upon ingest/update. Based on the state of the document, Text Embedding Processor will decide whether a call to model inference can be skipped. This change will optimize the use of model inference to skip the call when applicable, but at the cost of requiring to fetch document state on every ingest and update. In addition, this change should be extensible to other applicable processors such as TextImageEmbeddingProcessor
  2. If ignore_unaltered is not defined or set to false, Text Embedding Processor will make a call to model inference every time.

Questions Considered

  1. How common is it for users to update ingest pipeline without re-indexing? If this is uncommon, model ID check in document may not be necessary.
    1. As per discussion on 1/14, model ID check in document will not be supported for p0, meaning re-indexing will also not be supported. The new Text Embedding Processor will not have a custom behavior in case of re-index.
  2. Currently, neural search supports directly updating vector values in document. How do we want to handle this case? With current design, inference will be skipped on update, since document state is determined only with text and model ID fields.

Appendix

1 . 1 Alternative approach with modified Ingestion Flow
In order to check whether a model has been updated, the ingested document needs to store model ID in its field, which associates model ID to the embeddings. Text Embedding processor will fetch the model ID when user updates the doc to ensure it has not changed before skipping call to create embeddings.

Image

Request For Feedback

We would like to get some feedback on the name of the feature. ignore_unaltered is what we're proposing, but would appreciate suggestions

@vibrantvarun
Copy link
Member

I think the ideal name for this feature should ignore_inference. Simple, plain and to the point.
cc: @martin-gaievski @heemin32

@heemin32
Copy link
Collaborator

heemin32 commented Feb 3, 2025

I would vote for ignore_existing per #793 (comment)

ignore_existing make sense by not running the processor if expected embedding already exist.

@will-hwang
Copy link
Contributor Author

i would vote for ignore_existing too. ignore_inference is simple, but it doesn't match the behavior of the flag since inference could still be made

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants