You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Usually these processors have fixed key: field and value: value represents an operation of value to either an existing key or new key in the document.
But in neural search plugin, we need another pattern: we need to map an existing key to a new key. E.g.
"title": "title_knn"
means to map the title in the document to a new key title_knn which is generated by extra logic. Also, we need to support complex nested object configurations to map multiple fields in one document, an example looks like below:
As more and more processors need the multiple fields mapping configuration, and usually this scenario involves data validation and extraction, which is a pretty common logic across different processors. In neural search, several processors has similar data validation and extraction logic, e.g. InferenceProcessor, TextImageEmbeddingProcessor and TextChunkingProcessor. And the main problems are:
Validation and extraction code across different processors even different plugins are similar but not reused.
Any enhancement to the validation and extraction logic needs duplicated in different processors.
Describe the solution you'd like
We can support mapping configuration in opensearch core so that it can be reused in different processors across different plugins. By moving the text_embedding’s json style configuration to OpenSearch Core, we can make the validation and extraction logic reusable. Beside, we should also support dotted fashion configuration to make it easier for users, e.g.:
We can create a Util class which is similar to ConfigurationUtils and with this util, different processors and plugins can use the default methods in it or override with their own requirements.
[Triage - attendees 123456] @zane-neo Thanks for creating this issue; however, it isn't being accepted due to not having a clear outcome - without more details, can you please rewrite the issue so it is more approachable to OpenSearch developers that are not familiar with the space. Please feel free to open a new issue after addressing the reason.
Is your feature request related to a problem? Please describe
Background
Current OpenSearch Core support field value configuration in multiple processors, e.g. AppendProcessor, SetProcessor etc. An example like below:
Usually these processors have fixed key: field and value: value represents an operation of value to either an existing key or new key in the document.
But in neural search plugin, we need another pattern: we need to map an existing key to a new key. E.g.
means to map the title in the document to a new key title_knn which is generated by extra logic. Also, we need to support complex nested object configurations to map multiple fields in one document, an example looks like below:
Problem statement
As more and more processors need the multiple fields mapping configuration, and usually this scenario involves data validation and extraction, which is a pretty common logic across different processors. In neural search, several processors has similar data validation and extraction logic, e.g. InferenceProcessor, TextImageEmbeddingProcessor and TextChunkingProcessor. And the main problems are:
Describe the solution you'd like
We can support
mapping configuration
in opensearch core so that it can be reused in different processors across different plugins. By moving the text_embedding’s json style configuration to OpenSearch Core, we can make the validation and extraction logic reusable. Beside, we should also support dotted fashion configuration to make it easier for users, e.g.:We can create a Util class which is similar to ConfigurationUtils and with this util, different processors and plugins can use the default methods in it or override with their own requirements.
Related component
Plugins
Describe alternatives you've considered
No response
Additional context
opensearch-project/neural-search#660
The text was updated successfully, but these errors were encountered: