[RFC] Optimizing Text Embedding Processor #1138

will-hwang · 2025-01-23T20:03:40Z

Optimizing Text Embedding Processor

Problem Statement

Proposal: #793

Text Embedding Processor is a processor that is defined as an input to the ingest pipeline to create vector embeddings from text. In current state, text embedding processor makes model inference calls for every document ingestion or update. While this approach is necessary for generating embeddings during initial document ingestion, it's unnecessary to regenerate them during document update if the embedding-related fields remain unchanged. This inefficiency can lead to unnecessary cost increase for customers and computational overhead for model inference. This document discusses the design to achieving optimization of the Text Embedding Processor in Neural Search.

Requirement

Enable user to add a settingto configure Text Embedding Processor to call or skip model inferencing when appropriate
Skip call to model inference when flag is enabled and the field for embeddings has not changed

Out of Scope

Skip the check for model ID change between initial ingestion of the document and the update document action due to complexity involved.
Exclude the feature in AOSS due to complexity involved with different indexing methodology

Current State

Text Embedding Processor Configuration

Currently, Text Embedding Processor expects two fields:

model_id: A model to be used for creating vector embeddings
field_map: specifies the name of the field from which to take the text (text) and the name of the field in which to record embeddings (passage_embedding)

{
  "description": "An NLP ingest pipeline",
  "processors": [
    {
      "text_embedding": {
        "model_id": "aVeif4oB5Vm0Tdw8zYO2",
        "field_map": {
          "text": "passage_embedding"
        }
      }
    }
  ]
}

Reference: link

Current Flows

In current flow, there is no difference between ingestion/update of the document. The embeddings are created every time a document is ingested and/or updated in the Text Embedding Processor. See below for different use case scenarios:

Current Scenario 1: Single Document Update with embedding field

Steps:

User ingests/updates Doc1
Text Embedding Processor invokes model inference for text via MLCommonClientAccessor
ML Common returns embeddings for text in Doc1
Text Embedding Processor populates vector embeddings for the ingested/updated Doc1

Current Scenario 2: Single Document Update without embedding field

Steps:

User updates Doc1 without text field
Text Embedding Processor skips model inference because text field does not exist
Doc1 is updated with only the fields defined in the request, removing the existing text and embedding fields. In this update scenario, irrelevant_field is updated from 1 to 2 and text and embedding fields are removed.

Current Scenario 3: Single Document Update with both embedding and vector embedded fields

Steps:

User updates Doc1 with vector embedded field
Text Embedding Processor invokes model inference because text field exists
ML Common returns embeddings for text in Doc1
Doc1 is updated with embeddings generated from ML Commons, not with embeddings passed in from the user.

Current Scenario 4: Single Document Update with only vector embedded field

Steps:

User updates Doc1 with just vector embedded field
Text Embedding Processor skips model inference because text field does not exist
Doc1 is updated with only the fields defined in the request (embedding field), removing the existing text field and overwriting the existing embedding field.

Proposed State

Proposed Text Embedding Processor Configuration

An optional flag ignore_unaltered will be supported as an input to text_embedding. If the flag is defined and set to false, Text Embedding Processor will attempt to skip eligible inference text. If the flag is not defined or set to true, Text Embedding Processor will behave as is today which will always make a call to model, without requiring a check on document state.

{
    "description": "An NLP ingest pipeline",
    "processors": [
        {
            "text_embedding": {
                "model_id": "aVeif4oB5Vm0Tdw8zYO2",
                "field_map": {
                    "text": "passage_embedding"
                },
                "ignore_unaltered": "true"/"false" // optional field that can be defined by user**
            }
        }
    ]
}

Alternative is to define the flag at the cluster level, which will enable/disable the optimization at the cluster level. Since this is a flag that pertains specifically to text embedding processor, the proposed configuration keeps the flag as part of parameter in text embedding processor only. If, in the future, other processors implement features for optimization, a general optimization flag can be set at the cluster level.

Proposed Flows

Initial Document Ingestion Flow

Document ingestion flow will not change. [Refer to Appendix 1.1 for prior proposal]

Update Document Flow

The updated flow will involve the existing OpenSearchClient defined through client in Neural Search. The client will serve to interact with APIs offered in OpenSearch Core. For this use case, the client will be used to fetch the already ingested Documents.

Proposed Scenario 1: Single Document Update with single embedding field

Steps:

User updates Doc1, with changes to irrelevant_field.
Text Embedding Processor fetches the existing Doc1 via OpenSearchClient
1. If Doc1 does not exist, Text Embedding Processor creates embeddings and index the Doc.
Text Embedding Processor checks:
1. embedding exists in the returned Doc
2. inference text has not changed on update
If all checks pass, Text Embedding Processor skips call to create embeddings.
1. If any of the checks fail, Text Embedding Processor will invoke ML Common’s inference API to create the embeddings
Doc is updated with only changes to irrelevant_field, leaving the other fields the same.

Proposed Scenario 2: Single Document Update with multiple embedding fields

Steps:

User updates Doc1, with changes to text_2 field, an additional field with mapped embeddings.
Text Embedding Processor will fetch the existing Doc1 via OpenSearchClient
1. If Doc1 does not exist, Text Embedding Processor creates embeddings and index the Doc.
Text Embedding Processor checks for both fields text_1 and text_2:
1. embedding exists in the returned Doc
2. inference text has not changed on update
Text Embedding Processor acknowledges a change has made only in text_2 field
Inference call is made for text_2 field, because its value has changed
Inference call is skipped for text_1 field, because its value has remained unchanged.
Doc is updated with only changes to text_2 and embedding_2 fields*,* leaving the text_1 and embedding_1 fields the same.

Proposed Scenario 3: Single Document Update without embedding field

No change to existing behavior. Model Inference will be skipped regardless of the feature, because the text field is missing - Check Current Flow - Current Scenario 2 for expected flow

User Scenario 4: Single Document Update with vector embedded field

Steps:

User updates Doc1, including vector embedded field
Text Embedding Processor fetches the existing Doc1 via OpenSearchClient
1. If Doc1 does not exist, Text Embedding Processor creates embeddings and index the Doc.
Text Embedding Processor will check:
1. embedding exists in the returned Doc
2. inference text has not changed on update
If all checks pass, Text Embedding Processor skips call to create embeddings. In this scenario, both checks pass because text field has not changed, and embedding field is present in the existing Doc1
1. If any of the checks fail, Text Embedding Processor will invoke ML Common’s inference API to create the embeddings
Doc1 is not updated because the checks have passed. The manually passed in embedding field is not reflected in the update.

User Scenario 4: Single Document Update with only vector embedded field

No change to existing behavior. Model Inference will be skipped regardless of the feature, because the text field is missing.

Check Current Flow - User Scenario 4 for expected flow

User Scenario 5: Batch Document Update

Steps:

User updates a batch of documents, with changes made to a partial number of documents.
Text Embedding Processor will fetch the each of the existing document via OpenSearchClient
1. If a document does not exist, Text Embedding Processor will create embeddings and index the Doc.
Text Embedding Processor will check for each document:
1. embedding exists in the returned Doc
2. inference text has not changed on update
a. If a document passes the checks in step 3, inference call to ML Commons is skipped
b. If a document does not pass the checks in step 3, inference call to ML Commons is made
Only the documents with changes to text field are updated with new inference embeddings

Summary

The outcome of the proposed change can be summarized as follows:

If ignore_unaltered is defined and set to true, Text Embedding Processor will make a call to fetch the ingested Document upon ingest/update. Based on the state of the document, Text Embedding Processor will decide whether a call to model inference can be skipped. This change will optimize the use of model inference to skip the call when applicable, but at the cost of requiring to fetch document state on every ingest and update. In addition, this change should be extensible to other applicable processors such as TextImageEmbeddingProcessor
If ignore_unaltered is not defined or set to false, Text Embedding Processor will make a call to model inference every time.

Questions Considered

How common is it for users to update ingest pipeline without re-indexing? If this is uncommon, model ID check in document may not be necessary.
1. As per discussion on 1/14, model ID check in document will not be supported for p0, meaning re-indexing will also not be supported. The new Text Embedding Processor will not have a custom behavior in case of re-index.
Currently, neural search supports directly updating vector values in document. How do we want to handle this case? With current design, inference will be skipped on update, since document state is determined only with text and model ID fields.

Appendix

1 . 1 Alternative approach with modified Ingestion Flow
In order to check whether a model has been updated, the ingested document needs to store model ID in its field, which associates model ID to the embeddings. Text Embedding processor will fetch the model ID when user updates the doc to ensure it has not changed before skipping call to create embeddings.

Request For Feedback

We would like to get some feedback on the name of the feature. ignore_unaltered is what we're proposing, but would appreciate suggestions

The text was updated successfully, but these errors were encountered:

vibrantvarun · 2025-02-03T19:32:23Z

I think the ideal name for this feature should ignore_inference. Simple, plain and to the point.
cc: @martin-gaievski @heemin32

heemin32 · 2025-02-03T20:57:54Z

I would vote for ignore_existing per #793 (comment)

ignore_existing make sense by not running the processor if expected embedding already exist.

will-hwang · 2025-02-04T00:17:54Z

i would vote for ignore_existing too. ignore_inference is simple, but it doesn't match the behavior of the flag since inference could still be made

github-actions bot added the untriaged label Jan 23, 2025

heemin32 removed the untriaged label Jan 23, 2025

heemin32 assigned will-hwang Jan 23, 2025

q-andy mentioned this issue Jan 28, 2025

[FEATURE]Add Metrics for Neural Search Usage #1104

Open

will-hwang mentioned this issue Jan 30, 2025

[Draft PR] proposed implementation for supporting inference call skip in text embedding processor #1155

Closed

5 tasks

will-hwang mentioned this issue Feb 19, 2025

Optimized text embedding processor for single document update #1191

Open

5 tasks

q-andy mentioned this issue Feb 25, 2025

[RFC] Neural Plugin Stats API #1196

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Optimizing Text Embedding Processor #1138

[RFC] Optimizing Text Embedding Processor #1138

will-hwang commented Jan 23, 2025 •

edited

Loading

vibrantvarun commented Feb 3, 2025

heemin32 commented Feb 3, 2025

will-hwang commented Feb 4, 2025

[RFC] Optimizing Text Embedding Processor #1138

[RFC] Optimizing Text Embedding Processor #1138

Comments

will-hwang commented Jan 23, 2025 • edited Loading

Optimizing Text Embedding Processor

Problem Statement

Requirement

Out of Scope

Current State

Text Embedding Processor Configuration

Current Flows

Proposed State

Proposed Text Embedding Processor Configuration

Proposed Flows

Initial Document Ingestion Flow

Update Document Flow

Summary

Questions Considered

Appendix

Request For Feedback

vibrantvarun commented Feb 3, 2025

heemin32 commented Feb 3, 2025

will-hwang commented Feb 4, 2025

will-hwang commented Jan 23, 2025 •

edited

Loading