Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Embeddings pipeline with remote model using AWS Bedrock Connector throws error when indexing or simulate #1201

Closed
t49tran opened this issue Feb 26, 2025 · 2 comments
Labels
bug Something isn't working untriaged

Comments

@t49tran
Copy link

t49tran commented Feb 26, 2025

What is the bug?

I am using OpenSearch 2.19 / 2.18.

A connector to AWS Amazon Bedrock has been created with the below configuration, following the blueprint here.

After creation, when indexing data or simulate the pipeline, an error is thrown, indicating some issues with data mapping between the connector and the processor.

The error:

{
  "docs": [
    {
      "error": {
        "root_cause": [
          {
            "type": "status_exception",
            "reason": """Error validating input schema: Validation failed: [$: required property 'parameters' not found] for instance: {"algorithm":"REMOTE","text_docs":["Orange table"],"return_bytes":false,"return_number":true,"target_response":["sentence_embedding"]} with schema: {
    "type": "object",
    "properties": {
        "parameters": {
            "type": "object",
            "properties": {
                "inputText": {
                    "type": "string"
                },
                "inputImage": {
                    "type": "string"
                }
            }
        }
    },
    "required": [
        "parameters"
    ]
}"""
          }
        ],
        "type": "status_exception",
        "reason": """Error validating input schema: Validation failed: [$: required property 'parameters' not found] for instance: {"algorithm":"REMOTE","text_docs":["Orange table"],"return_bytes":false,"return_number":true,"target_response":["sentence_embedding"]} with schema: {
    "type": "object",
    "properties": {
        "parameters": {
            "type": "object",
            "properties": {
                "inputText": {
                    "type": "string"
                },
                "inputImage": {
                    "type": "string"
                }
            }
        }
    },
    "required": [
        "parameters"
    ]
}"""
      }
    }
  ]
}

How can one reproduce the bug?

POST /_plugins/_ml/connectors/_create
{
  "name": "Amazon Bedrock Connector: multi-modal embedding",
  "description": "The connector to bedrock Titan multi-modal embedding model",
  "version": 1,
  "protocol": "aws_sigv4",
  "parameters": {
    "region": "",
    "service_name": "bedrock",
    "model": "amazon.titan-embed-image-v1",
    "input_docs_processed_step_size": 2
  },
  "credential": {
    "access_key": "",
    "secret_key": "",
    "session_token": ""
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/${parameters.model}/invoke",
      "headers": {
        "content-type": "application/json",
        "x-amz-content-sha256": "required"
      },
      "request_body": "{\"inputText\": \"${parameters.inputText:-null}\", \"inputImage\": \"${parameters.inputImage:-null}\"}",
      "pre_process_function": "connector.pre_process.bedrock.multimodal_embedding",
      "post_process_function": "connector.post_process.bedrock.embedding"
    }
  ]
}

This connector is used to create a model for a pipeline with text embedding processor. Pipeline:

PUT /_ingest/pipeline/aws-multimodal-pipeline
{
  "description": "A text/image using aws bedrock embedding pipeline",
  "processors": [
    {
      "text_image_embedding": {
        "model_id": "TzMSQpUBEsYL3wz3lnAk",
        "embedding": "vector_embedding",
        "field_map": {
          "text": "description",
          "image": "image_binary"
        }
      }
    }
  ]
}

An index was then created:

PUT /doc-multimodal
{
  "settings": {
    "index.knn": true,
    "default_pipeline": "aws-multimodal-pipeline",
    "number_of_shards": 2
  },
  "mappings": {
    "properties": {
      "vector_embedding": {
        "type": "knn_vector",
        "dimension": 1024,
        "method": {
          "name": "hnsw",
          "engine": "lucene",
          "parameters": {}
        }
      },
      "description": {
        "type": "text"
      },
      "image_binary": {
        "type": "binary"
      }
    }
  }
}

Pipeline simulation request:

POST /_ingest/pipeline/aws-multimodal-pipeline/_simulate
{
  "docs": [
    {
      "_index": "testindex1",
      "_id": "1",
      "_source":{
         "description": "Orange table"
      }
    }
  ]
}

What is the expected behavior?

The simulation/pipeline should run successfully.

What is your host/environment?

Ubuntu 24.4, Opensearch docker image 2.18 and 2.19

@t49tran t49tran added bug Something isn't working untriaged labels Feb 26, 2025
@weijia-aws
Copy link
Contributor

Hi @t49tran, I just followed your steps, but I didn't receive such error, the simulate and ingest API worked for me. I'm using the Opensearch docker image 2.19 on Mac OS 15.3.1.

Also I don't see your model registration step, maybe something went wrong such as incorrect connector id?

@t49tran
Copy link
Author

t49tran commented Feb 28, 2025

Hi @weijia-aws, thanks for looking into this. I have encountered this problem in 2.18 and at first updated to 2.19 didn't resolve it.

I ran this on local ubuntu.

After a clean installation with 2.19 the problem has been resolved, maybe I made some mistakes before.

I think this issue can now be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working untriaged
Projects
None yet
Development

No branches or pull requests

3 participants