Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] hybrid query support on index aliases with filter use cases #627

Closed
ksingh17i opened this issue Mar 8, 2024 · 7 comments
Closed
Labels
Features Introduces a new unit of functionality that satisfies a requirement v2.13.0

Comments

@ksingh17i
Copy link

What is the bug?

We are getting following error when running hybrid queries on index alias. The same query works fine if index name is used.

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "hybrid query must be a top level query and cannot be wrapped into other queries"
      }
    ]
} 

This issue occurs only when index alias is created with filters, here is the alias definition for our index alias

{
  "index-name-v0": {
    "aliases": {
      "ro-alias-index-name": {
        "filter": {
          "bool": {
            "must_not": {
              "term": {
                "eventType": "DELETED"
              }
            }
          }
        }
      }
    }
  }
} 

How can one reproduce the bug?

  1. Create the index with text and vector fields
  2. Populate the data in the index
  3. Create alias for the index as defined above
  4. Run hybrid queries with alias name

What is the expected behavior?

Irrespective of using the index name or alias name in the hybrid query, open search should return the same results.

What is your host/environment?

AWS Managed OpenSearch 2.11

Do you have any screenshots?

image

Do you have any additional context?

The query runs fine if index alias is created without any filters.

@ksingh17i ksingh17i added bug Something isn't working untriaged labels Mar 8, 2024
@ksingh17i
Copy link
Author

@vamshin Please look into this issue as well

@navneet1v
Copy link
Collaborator

@ksingh17i can you add the steps of reproudcing the error

@ksingh17i
Copy link
Author

ksingh17i commented Mar 11, 2024

Hi @navneet1v I already added them in the ticket details, the only important step is to add an index alias with filter and then use that alias to run hybrid queries.

  1. Create the index with text and vector fields
  2. Populate the data in the index
  3. Create alias for the index as defined below
  4. Run hybrid queries with alias name
{
  "index-name-v0": {
    "aliases": {
      "ro-alias-index-name": {
        "filter": {
          "bool": {
            "must_not": {
              "term": {
                "eventType": "DELETED"
              }
            }
          }
        }
      }
    }
  }
} 

@ksingh17i
Copy link
Author

This situation is similar to, if say, we create a search pipeline with a combination of Filter Query processor and Normalization processor. And then try to run hybrid query (match & knn)

@navneet1v
Copy link
Collaborator

@ksingh17i the reason this happens is because Hybrid Query clause needs to be a top level query clause and these filter processors and index alias wraps the whole query into another compound query clause which breaks the hybrid query clause.

We need to do more deep-dive to identify how to unwrap the query clauses. This will take some time. In meantime I would suggest not using Filter Query processor and index alias filter and put the filters in the query itself. Example:

POST example-index/_search?search_pipeline=nlp-search-pipeline
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "bool": {
            "should": [
              {
                "match": {
                  "text1": "neural"
                }
              }
            ],
            "filter": { // the filter for index alias used as filter for internal query
              "bool": {
                "must_not": {
                  "term": {
                    "eventType": "DELETED"
                  }
                }
              }
            }
          }
        },
        {
          "knn": { // this can be neural query clause too.
            "my_vector": {
              "vector": [
                3
              ],
              "k": 10,
              "filter": { // filter for index alias use as efficient filter here for vector search.
                "bool": { 
                  "must_not": {
                    "term": {
                      "eventType": "DELETED"
                    }
                  }
                }
              }
            }
          }
        }
      ]
    }
  }
}

@navneet1v navneet1v moved this to Backlog (Hot) in Vector Search RoadMap Apr 1, 2024
@vamshin vamshin moved this from Backlog (Hot) to 2.15.0 in Vector Search RoadMap Apr 1, 2024
@vamshin vamshin removed the untriaged label Apr 1, 2024
@vamshin vamshin added the v2.15.0 label Apr 4, 2024
@vamshin vamshin changed the title [BUG] Illegal argument exception when running hybrid query on index alias with filter [Feature] hybrid query support on index aliases with filter use cases Apr 4, 2024
@vamshin vamshin added the Features Introduces a new unit of functionality that satisfies a requirement label Apr 4, 2024
@vamshin
Copy link
Member

vamshin commented Apr 4, 2024

Created this as a feature request and we are prioritizing for 2.15 version.

@vamshin vamshin removed the bug Something isn't working label Apr 4, 2024
@github-project-automation github-project-automation bot moved this to Planned work items in Test roadmap format Apr 9, 2024
@navneet1v
Copy link
Collaborator

@martin-gaievski lets close this issue as the feature is merged. @vamshin this feature is released in 2.13 version.

@github-project-automation github-project-automation bot moved this from 2.15.0 to ✅ Done in Vector Search RoadMap Jun 12, 2024
@navneet1v navneet1v added v2.13.0 and removed v2.15.0 labels Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Features Introduces a new unit of functionality that satisfies a requirement v2.13.0
Projects
Status: Planned work items
Status: Done
Development

No branches or pull requests

4 participants