Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor QueryCollectorContext to improve extensibility #11778

Conversation

martin-gaievski
Copy link
Member

@martin-gaievski martin-gaievski commented Jan 5, 2024

Description

Methods create and postProcess in QueryCollectorContext cannot be called from outside of the package. That is limiting extensions for core OS in certain scenarios, as those methods need to be called for functionality like aggregation in search queries.

In this PR I propose to change modifiers from default to public for methods create and prosProcess. Other changes are in implementing classes to adopt those changes in method signatures.

Related Issues

While it's not directly related I came up with this PR in scope of opensearch-project/neural-search#509

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Jan 5, 2024

❌ Gradle check result for 04bce0e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Jan 5, 2024

Compatibility status:

Checks if related components are compatible with change 021f593

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/sql.git]

Copy link
Contributor

github-actions bot commented Jan 5, 2024

✅ Gradle check result for be0d6c8: SUCCESS

@martin-gaievski martin-gaievski force-pushed the make-methods-in-doc-collector-classes-public branch from 27a54b9 to 021f593 Compare January 10, 2024 20:37
@jed326
Copy link
Collaborator

jed326 commented Jan 10, 2024

@martin-gaievski Thanks for the details! I don't want to derail the conversation too much from @reta's original concerns but I do want to re-iterate my concern that all aggregations may not just work out of the box even if we do go ahead with the change in this PR. In that case we want to avoid a situation where today we open up these 2 methods and then later on you discover there are additional methods we also need to open up, etc. I think sharing a high level design for opensearch-project/neural-search#509 first would help us understand if this change is truly needed as well as any alternatives considered.

@reta
Copy link
Collaborator

reta commented Jan 10, 2024

With CollectorManager design the idea is to add the collector to the manager, and it's manager's responsibility to call that collector. Collector should just get scores and implement reduce logic. But that assumption of having 1-1 relationship between score and doc id is still there.

I should go and reread the RFC (opensearch-project/neural-search#126) but just quick question on that: in general, CollectorManager could be using many collectors (MultiCollectorManager) so each one could produce own scores and reduce phase would return scores per collector type. Wouldn't it address the problem of wrapping up multiple scores per doc?

Copy link
Contributor

❌ Gradle check result for 021f593: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 9acc035: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❕ Gradle check result for 27a54b9: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.remotestore.RemoteIndexPrimaryRelocationIT.testPrimaryRelocationWhileIndexing
      1 org.opensearch.action.admin.indices.create.CreateIndexIT.testCreateAndDeleteIndexConcurrently
      1 org.opensearch.action.admin.indices.create.CreateIndexIT.classMethod

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@martin-gaievski
Copy link
Member Author

With CollectorManager design the idea is to add the collector to the manager, and it's manager's responsibility to call that collector. Collector should just get scores and implement reduce logic. But that assumption of having 1-1 relationship between score and doc id is still there.

I should go and reread the RFC (opensearch-project/neural-search#126) but just quick question on that: in general, CollectorManager could be using many collectors (MultiCollectorManager) so each one could produce own scores and reduce phase would return scores per collector type. Wouldn't it address the problem of wrapping up multiple scores per doc?

This sounds interesting, I need to explorer this option. But before going deep into details can you help with few questions/concerns:

  • is it guaranteed that reduce method of such multi collector manager will be called after all child managers are done (or failed)? we will need scores for all queries before we can compile the final score list at shards level.
  • I feel like we'll need to create new QueryCollectorContext as managers and collectors are created by calling context methods. Will you be ok in opening methods then, if its required? The QueryCollectorContext class is public abstract, but custom implementation cannot be created because abstract methods are with default access.
  • with such approach how do we support aggregations? Collector for aggregation will be passed to queryphase searcher, but we probably need to combine it with that new MultiCollectorManager for hybrid query and then wrap both into a another collector manager. Does it make sense?

as a remark, with a multicollector approach collectors will be of the same type, just one per inner query. Hybrid query acts like a compound query, where any other query can be an inner query.

@reta
Copy link
Collaborator

reta commented Jan 11, 2024

  • is it guaranteed that reduce method of such multi collector manager will be called after all child managers are done (or failed)? we will need scores for all queries before we can compile the final score list at shards level.

Yes

  • I feel like we'll need to create new QueryCollectorContext as managers and collectors are created by calling context methods. Will you be ok in opening methods then, if its required? The QueryCollectorContext class is public abstract, but custom implementation cannot be created because abstract methods are with default access.

No, you could use SearchContext::queryCollectorManagers to add your own (pointed it here #11778 (comment))

  • with such approach how do we support aggregations? Collector for aggregation will be passed to queryphase searcher, but we probably need to combine it with that new MultiCollectorManager for hybrid query and then wrap both into a another collector manager. Does it make sense?

I think the answer would be - probably not, aggregation are essentially a framework on top of collectors / collector managers that has own API. We may need a new time of the aggregator to design, if we have a gap there - we need to understand what is missing.

@navneet1v
Copy link
Contributor

navneet1v commented Jan 13, 2024

No, you could use SearchContext::queryCollectorManagers to add your own (pointed it here #11778 (comment))

Thanks @reta, providing this alternative. After reading the comments on the PR what I feel is we have changed the design of Aggregation for concurrent segment search and there has been some new interfaces that been added. Its worth looking into the new interfaces and see if it can solve the aggregations for hybrid query clause. As pointed out by @martin-gaievski that we will do some deep-dive and get back on this.

But on a very high level if I have to provide whats the requirement here is:

  1. A plugin which is implementing a new QueryPhaseSearcher wants to add another collector in the collector list. This will ensure that new collectors that has been added by the Plugin Query phase Searcher is called along with other collectors. The rest of the flow should remain same to start with.

Then with more testing we keep seeing if some changes are required in the already implemented aggregation flow to remove the issues in aggregation happening with Hybrid Query clause. Note: The Plugin Query phase searcher is called only if the hybrid query clause is a top level query clause, so the impact will be on the aggs with hybrid query clause and nothing else.

@navneet1v
Copy link
Contributor

@reta

I did some more deep-dive(Looked into ConcurrentQueryPhaseSearcher class, QueryCollectorManagerContext, CollectorManager and DefaultQueryPhaseSearcher) and here is what I can find:

  1. The queryphaseSearchers gets CollectorContexts as a parameter to its searchWith function. These collectorcontexts are then converted to single collector(in defaultQueryPhaseSearcher) or a single collector manager(ConcurrentQueryPhaseSearcher) which is then passed to the IndexSearcher class.
  2. All this is good because the CollectorContexts are getting created on Opensearch Core. How this is different in HybridQueryPhaseSearcher. The HybridQueryPhaseSearcher creates its own Collector(not collector context) inside the HybridQueryPhaseSearcher[ref].
  3. The Collector context cannot be created in HybridQueryPhaseSearcher because the functions(create, createManager, createQueryCollector, createQueryCollectorWithProfiler etc) are not public. This limits the functionality of Plugins who are adding a new QueryPhaseSearcher to add a new Collector context. And part of the change in this PR is to make those functions public.
  4. To work around this HybridQueryPhaseSearcher as of not is [dropping the collector contexts] (https://github.com/opensearch-project/neural-search/blob/ff3862250ccdae41798fe0787d4872a9b07ffe2d/src/main/java/org/opensearch/neuralsearch/search/query/HybridQueryPhaseSearcher.java#L216-L221) which are passed to it and create a new collector that is passed to the index searcher. Due to this, only search is working other things like aggregation, post filtering is not working.
  5. I do understand that rather than creating a single collector we can create a collectorManager which can then be passed to IndexSearcher but even for that, Neural Plugin needs to create a collector context which can be then added with CollectorsContext list passed to searchWith function as an argument and then to IndexSearcher.

Please let me know if there is gap in my understanding.

Once the interfaces are opened up the way I see HybridQueryPhaseSearcher implementation getting changed is:

  1. A new class HybridTopScoreDocCollectorContext should be created. It is then will be added as the first collectorcontext in the collectorsContext list passed to searchWith function.
  2. The indexSearcher can be called either with a collector or collector Manager. I think collector manager should be used. But not a strong preference here.
  3. Once the query is executed, then call the postProcess function on all the collector contexts.

cc: @martin-gaievski , @vamshin

@reta
Copy link
Collaborator

reta commented Jan 15, 2024

thanks @navneet1v

  1. All this is good because the CollectorContexts are getting created on Opensearch Core. How this is different in HybridQueryPhaseSearcher. The HybridQueryPhaseSearcher creates its own Collector(not collector context) inside the HybridQueryPhaseSearcher
  2. I do understand that rather than creating a single collector we can create a collectorManager which can then be passed to IndexSearcher but even for that, Neural Plugin needs to create a collector context which can be then added with CollectorsContext list passed to searchWith function as an argument and then to IndexSearcher.

This part I don't understand, any QueryPhaseSearcher could add new collector managers to the context, which is passed as argument, there is no need to do anything with the XxxCollectorContexts here:

context.queryCollectorManagers().put(MyHybridCollectorManager.class, new MyHybridCollectorManager());

Also, which I think is the most important thought to keep in mind: QueryPhaseSearcher was introduced solely to support seamless concurrent / non-concurrent branching of the search flow. It may not fit to what you folks are looking to do - it is perfectly fine, we could think about right abstraction / APIs instead.

@martin-gaievski
Copy link
Member Author

I'm checking Andriy's suggestion for using CollectorManager without creating custom context object. It looks promising so far, I built a small POC and basic sanity checks are passing for both aggregations and existing functionality of hybrid query. We may need to tweak few small things but overall it looks good, thanks for your input @reta

cc: @navneet1v

@martin-gaievski
Copy link
Member Author

@reta Do you have any suggestion on following finding.

if we call IndexSearcher.search(Query, CollectorManager) then there will be exception throwing from DefaultSearchContext.getTargetMaxSliceCount(), this is because two core aggregations sampler and diversified_sampler do not support concurrent search (this has been added with this PR #11087).

my understanding is that in core system falls back to a non-concurrent search. Is there something that we can control from plugin side to avoid an exception and allow query to work in the same non-concurrent way?

@reta
Copy link
Collaborator

reta commented Feb 15, 2024

my understanding is that in core system falls back to a non-concurrent search. Is there something that we can control from plugin side to avoid an exception and allow query to work in the same non-concurrent way?

@martin-gaievski so the aggregations have this method

@Override
    protected boolean supportsConcurrentSegmentSearch() {
      // return true or false
    }

When search request comes in, it is being inspected, among other things, if there are any aggregations that are not compatible with concurrent search, and if yes, it delegates to nonconcurrent path. To answer your question - the IndexSearcher.search(Query, CollectorManager) path should never be taken if there are components incompatible with concurrent search. We may be are missing some inspection here? (fe SearchContext::queryCollectorManagers())

@martin-gaievski
Copy link
Member Author

@reta my understanding is that you are suggesting something like:

if (concurrent_search enabled AND aggs_present_AND_support_concurrent_search):
     call `IndexSearcher.search(Query, CollectorManager)`
else
     call other `IndexSearcher.search` method

the problem is that for adding aggregation support to hybrid query we can only follow CollectorManager approach as direct calls to QueryContext has been locked, as per your initial explanation. So we'll be taking that approach independently of concurrent search being enabled or not.

Do you mean that SearchContext::queryCollectorManagers() should return different results depending on all aggs in request being compatible/not compatible with concurrent search?

@reta
Copy link
Collaborator

reta commented Feb 15, 2024

@reta my understanding is that you are suggesting something like:

@martin-gaievski no, the decision is taken by QueryPhaseSearcherWrapper

the problem is that for adding aggregation support to hybrid query we can only follow CollectorManager approach as direct calls to QueryContext has been locked, as per your initial explanation. So we'll be taking that approach independently of concurrent search being enabled or not.

I think this is totally fine, I believe that the current implementation does not inspect SearchContext::queryCollectorManagers() as part of go / no go (concurrent path flow). The implementation explicitly checks aggregations but if we are adding something through SearchContext::queryCollectorManagers() (which is totally fine), it is not taken into account, has to be fixed I believe, the SearchContext could make this checks.

@jed326
Copy link
Collaborator

jed326 commented Feb 21, 2024

if we call IndexSearcher.search(Query, CollectorManager) then there will be exception throwing from DefaultSearchContext.getTargetMaxSliceCount(), this is because two core aggregations sampler and diversified_sampler do not support concurrent search (this has been added with this PR #11087).

@martin-gaievski I'm late to the discussion here but I wanted to mention that aggregation support for concurrent search is opt-in for each aggregator. See:

/**
* Implementation should override this method and return true if the Aggregator created by the factory works with concurrent segment search execution model
*/
protected boolean supportsConcurrentSegmentSearch() {
return false;
}

This is because plugins can provide their own AggregatorFactory classes and we cannot guarantee that the existing concurrent segment search implementation will work correctly for all plugins, so we need to rely on plugin developers to verify on their own if their plugin can support concurrent segment search. For example, the ParentAggregator in the parent-join module also does not support concurrent segment search:

@Override
protected boolean supportsConcurrentSegmentSearch() {
// See https://github.com/opensearch-project/OpenSearch/issues/9316
return false;
}

@martin-gaievski
Copy link
Member Author

got it @jed326, I see that for hybrid query we need to take that into account and have a flexibility to switch between concurrent/non-concurrent flow depending on how it's defined in each aggregator.

I worked with @reta on possible approach, looks like we have found promising one that can provide following:

  • be compatible with existing and future aggregations in cases when it does or does not support concurrent search
  • don't interfere with the search logic internal, like re-using collectors for concurrent search cases

main idea is to add new collector manager before the QueryPhase.searchInternal and for non-concurrent search do collectorManager.reduce manually after QueryPhase.searchInternal. I put all details into corresponding RFC in neural-search repo.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Mar 23, 2024
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Apr 25, 2024
@jed326
Copy link
Collaborator

jed326 commented Apr 25, 2024

Hey @martin-gaievski since aggregation support for hybrid queries has been implemented are we good to close this PR?

@martin-gaievski
Copy link
Member Author

Yes, let me close it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stalled Issues that have stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants