Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add non dense vector response support #260

Closed
zane-neo opened this issue Aug 23, 2023 · 2 comments
Closed

Add non dense vector response support #260

zane-neo opened this issue Aug 23, 2023 · 2 comments
Assignees

Comments

@zane-neo
Copy link
Collaborator

Is your feature request related to a problem?

In near future, neural search needs to support SPLADE model and there'll be a new processor type for this model, before integrating this model, we need to have an approach to parse the model response correctly. SPLADE model doesn't return dense vectors, instead it's response is a map like below:

{
    ".": 0.43459656834602356,
    "2": 0.2500864267349243,
    "7": 0.7361266613006592,
    "_": 1.1532046794891357,
    "a": 0.20618286728858948,
    "j": 1.7557533979415894,
    "y": 0.15952491760253906,
    "in": 0.010410528630018234
}

Current code only parse the dense vector type response:

private List<List<Float>> buildVectorFromResponse(MLOutput mlOutput) {
. In ml-commons, all non dense vector type response are encapsulated in a map called dataAsMap in ModelTensorOutput. So to support these response, we need to add a different method to extract result from this field, then when implementing SPLADE, this new method can be used to get the model response and apply other logic to the response.

What solution would you like?

Adding a new method in neural search MLCommonsClientAccessor to fetch response from ModelTensorOutput's dataAsMap field.

What alternatives have you considered?

Passing a flag in MLInput to let ml-commons return string type response in ModelTensorOutput's result field and this method accept Class parameter and return generic type result. Using Gson/Jackson to deserialize the string to generic type result to make this method works with object instead of map which is easier to use. The drawback is this needs more changes in ml-commons which could introduce error in ml-commons.

Do you have any additional context?

NA

@navneet1v
Copy link
Collaborator

@zane-neo as this is a new feature, and will probably go through multiple iterations before it is going to be released. Hence, for all new features let's not merge the changes directly in main. Lets merge keep reviewing the changes and merge them in a feature branch. Let me create a feature branch for this new feature.

So the process will go like this:

  1. Neural Search plugin maintainer will cut a feature branch from main branch. (Feature Branch: https://github.com/opensearch-project/neural-search/tree/feature/sparseVectorSupport)
  2. Contributors working on Sparse Vectors will raise the PR against the feature branch.
  3. Once the PR is reviewed it will go in the feature branch.
    4.Once all the changes are done and performance testing is done, commits of feature branch will be merged in the main branch.

Please let me know if there is any further questions.

@zane-neo
Copy link
Collaborator Author

zane-neo commented Oct 9, 2024

This is already implemented and can be closed.

@zane-neo zane-neo closed this as completed Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants