Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] error on complex types list type field [category] has empty string, cannot process it #2303

Closed
toyaokeke opened this issue Apr 9, 2024 · 4 comments
Labels
bug Something isn't working untriaged

Comments

@toyaokeke
Copy link

What is the bug?
I am creating a text embedding processor that creates vectors on a nested field. However, I receive illegal_argument_exception because not all the fields in the object meet the requirement

  • string
  • map
  • string list

Here is the explanation from the AWS support specialist

Our internal team informed me that this exception happened when the “id” under “brand” field has int value that is not supported by the text embedding processor from ingestion pipeline, and the fields inside the complex type must be of types: string, map or list.

However, I am not creating vectors on id so I don't understand why it must follow these requirements. Is this expected behaviour or is this a bug?

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. create ingest pipeline
PUT /_ingest/pipeline/neural-search-pipeline-v2
{
  "description": "An example neural search pipeline",
  "processors": [
    {
      "text_embedding": {
        "model_id": "WeliNowB6EaQJ_XFf05V",
        "field_map": {
          "category": {
            "name": {
              "en": "category_name_vector"
            }
          }
        }
      }
    }
  ]
}
  1. simulate ingest pipeline
POST _ingest/pipeline/neural-search-pipeline-v2/_simulate
{
  "docs": [
    {
      "_index": "neural-search-index-v2",
      "_id": "1",
      "_source": {
        "category": {
          "id": 1,
          "name": {
            "en": "category 1"
          }
        }
      }
    }
  ]
}

What is the expected behavior?
should create vectors on category name

{
    "docs": [
      {
        "doc": {
          "_index": "neural-search-index-v2",
          "_id": "1",
          "_source": {
            "category": {
              "name": {
                "category_name_vector": [
                  0.019107267,
                  -0.029297447,
                  0.0070927013,
                  -0.022105217,
                  ...
                ],
                "en": "category 1"
              },
              "id": 1
            }
          },
          "_ingest": {
            "timestamp": "2024-01-08T17:59:39.543401762Z"
          }
        }
      }
    ]
  }

What is your host/environment?

  • OS: AWS Opensearch Service Managed Cluster
  • Version 2.11

Do you have any screenshots?

{
   "failures": {
        "index": "neural-search-index-v2",
        "id": "5302821",
        "cause": {
          "type": "illegal_argument_exception",
          "reason": "list type field [category] has empty string, cannot process it"
        },
        "status": 400
   },
   ...
}

Do you have any additional context?

invalid doc

{
   "brand": {
      "id": 123, // cannot be integer
      "description": {
         "en": "en description female",
         "fr": "" // cannot be empty string
      }
      ...
   },
   "category": {
      "id": "123", // valid string
      "sizes": [
         "XS",
         "XL",
         "", // elements in list cannot be empty strings
         123 // elements in list cannot be integers
         ...
      ]
   }
}

valid doc

{
   "brand": {
      "id": "123",
      "description": {
         "en": "en description"
      }
      ...
   },
   "category": {
      "id": "123",
      "sizes": [ ] // empty list is valid
   }
}
@toyaokeke toyaokeke added bug Something isn't working untriaged labels Apr 9, 2024
@mingshl
Copy link
Collaborator

mingshl commented Apr 9, 2024

@toyaokeke Hi looking in your pipeline, it's the text embedding processors throwing exceptions. Should be related to the neural search plugin @navneet1v

@toyaokeke
Copy link
Author

@mingshl correct the text embedding processor is throwing the error. Should I move this discussion to that project instead?

@mingshl
Copy link
Collaborator

mingshl commented Apr 9, 2024

@mingshl correct the text embedding processor is throwing the error. Should I move this discussion to that project instead?

sounds good to me. thanks!! you can close this one or move to the neural search plugin

@toyaokeke
Copy link
Author

@mingshl correct the text embedding processor is throwing the error. Should I move this discussion to that project instead?

sounds good to me. thanks!! you can close this one or move to the neural search plugin

Issue is now reported in the neural-search project. Thank you very much 🙏🏿
opensearch-project/neural-search#678

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working untriaged
Projects
Development

No branches or pull requests

2 participants