[Feature Request] Mapping fashion configuration for pipeline processors #13128

zane-neo · 2024-04-09T03:06:53Z

Is your feature request related to a problem? Please describe

Background

Current OpenSearch Core support field value configuration in multiple processors, e.g. AppendProcessor, SetProcessor etc. An example like below:

{
  "append": {
    "field": "your_target_field",
    "value": "{{{tenure}}}"
  }
}

Usually these processors have fixed key: field and value: value represents an operation of value to either an existing key or new key in the document.
But in neural search plugin, we need another pattern: we need to map an existing key to a new key. E.g.

"title": "title_knn"

means to map the title in the document to a new key title_knn which is generated by extra logic. Also, we need to support complex nested object configurations to map multiple fields in one document, an example looks like below:

{
    "text_embedding": {
        "model_id": "WYjkv4MBHcWxVq8Jtc8U",
        "field_map": {
            "title": "title_knn",
            "todo_list": "todo_list_knn",
            "favorites": {
                "game": "game_knn",
                "movie": "movie_knn"
            }
        }
    }
}

Problem statement

As more and more processors need the multiple fields mapping configuration, and usually this scenario involves data validation and extraction, which is a pretty common logic across different processors. In neural search, several processors has similar data validation and extraction logic, e.g. InferenceProcessor, TextImageEmbeddingProcessor and TextChunkingProcessor. And the main problems are:

Validation and extraction code across different processors even different plugins are similar but not reused.
Any enhancement to the validation and extraction logic needs duplicated in different processors.

Describe the solution you'd like

We can support mapping configuration in opensearch core so that it can be reused in different processors across different plugins. By moving the text_embedding’s json style configuration to OpenSearch Core, we can make the validation and extraction logic reusable. Beside, we should also support dotted fashion configuration to make it easier for users, e.g.:

{
  "field_map": {
    "title": "title_knn",
    "todo_list": "todo_list_knn",
    "favorites.game": "favorites.game_knn",
    "favorites.movie": "favorites.movie_knn"
  }
}

We can create a Util class which is similar to ConfigurationUtils and with this util, different processors and plugins can use the default methods in it or override with their own requirements.

Related component

Plugins

Describe alternatives you've considered

No response

Additional context

opensearch-project/neural-search#660

The text was updated successfully, but these errors were encountered:

peternied · 2024-04-10T15:18:39Z

[Triage - attendees 1 2 3 4 5 6]
@zane-neo Thanks for creating this issue; however, it isn't being accepted due to not having a clear outcome - without more details, can you please rewrite the issue so it is more approachable to OpenSearch developers that are not familiar with the space. Please feel free to open a new issue after addressing the reason.

zane-neo added enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 9, 2024

github-actions bot added the Plugins label Apr 9, 2024

zane-neo mentioned this issue Apr 9, 2024

[FEATURE] Refactor on data validation and extraction from customer's documents in several processors opensearch-project/neural-search#660

Closed

peternied closed this as completed Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Mapping fashion configuration for pipeline processors #13128

[Feature Request] Mapping fashion configuration for pipeline processors #13128

zane-neo commented Apr 9, 2024 •

edited

Loading

peternied commented Apr 10, 2024

[Feature Request] Mapping fashion configuration for pipeline processors #13128

[Feature Request] Mapping fashion configuration for pipeline processors #13128

Comments

zane-neo commented Apr 9, 2024 • edited Loading

Is your feature request related to a problem? Please describe

Background

Problem statement

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

peternied commented Apr 10, 2024

zane-neo commented Apr 9, 2024 •

edited

Loading