You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently made a change to TotalHitCountCollectorManager, to make it take the leaf slices in its constructor, so that we can automatically decide what type of collectors are needed. That is because there is additional handling needed in case the search targets segment partitions, which causes a bit of overhead and we want to be able to avoid that overhead when not necessary.
There is a somewhat similar mechanism in the TopScoreDocCollectorManager and TopFieldCollectorManager constructors that take a boolean as last argument. That flag tells the manager if it has to create collectors that support concurrency or not. That does the job, but it can lead to errors in that users may not know exactly what to pass in and use the wrong value, which lead them to getting non deterministic results if they provide false and their searcher has an executor set to it. The other way around, there is a bit of overhead caused by the need for concurrency that perhaps can be automatically avoided when the search targets a single slice. As far as I can see the concurrent path is currently the default so far, and we mostly use the flag to disable concurrency when we are sure there is not executor set.
I wonder if we should replace that boolean argument with the array of leaf slices instead. That is not perfect but better than a boolean, I think. It is consistent with the current solution for TotalHitCountCollectorManager, and it allows the manager to make an accurate decision with lower likelihood of making a mistake (you may take slices from the wrong searcher instance perhaps if you use multiple searchers with different settings, but that's about it?). There are two main implied question in this:
should we use LeafSlice[] as an argument to automatically detect if concurrency needs to be supported or not?
while we are at it, should we also make that argument mandatory so that we may even avoid overhead caused by concurrency when the search targets a single slice, despite an executor has been set to the searcher (more adaptive to what we do today)?
I would like to collect opinions on this, I think it could be an additional small breaking change to make in Lucene 10 if there is appeal for it.
The text was updated successfully, but these errors were encountered:
I would rather remove these parameters and accept a small slowdown, users shouldn't have to know how IndexSearcher's concurrency is configured to create a CollectorManager.
++ to that @jpountz that would also be my preferred approach. I somehow made the assumption that we are not willing to go that route, but I think we should revisit that decision, assuming the slowdown is small. It feels wrong for sure that users have to know about internal execution when creating a collector manager. Ideally, providing slices to adapt the behaviour would not be necessary either.
We recently made a change to
TotalHitCountCollectorManager
, to make it take the leaf slices in its constructor, so that we can automatically decide what type of collectors are needed. That is because there is additional handling needed in case the search targets segment partitions, which causes a bit of overhead and we want to be able to avoid that overhead when not necessary.There is a somewhat similar mechanism in the
TopScoreDocCollectorManager
andTopFieldCollectorManager
constructors that take aboolean
as last argument. That flag tells the manager if it has to create collectors that support concurrency or not. That does the job, but it can lead to errors in that users may not know exactly what to pass in and use the wrong value, which lead them to getting non deterministic results if they providefalse
and their searcher has an executor set to it. The other way around, there is a bit of overhead caused by the need for concurrency that perhaps can be automatically avoided when the search targets a single slice. As far as I can see the concurrent path is currently the default so far, and we mostly use the flag to disable concurrency when we are sure there is not executor set.I wonder if we should replace that boolean argument with the array of leaf slices instead. That is not perfect but better than a boolean, I think. It is consistent with the current solution for
TotalHitCountCollectorManager
, and it allows the manager to make an accurate decision with lower likelihood of making a mistake (you may take slices from the wrong searcher instance perhaps if you use multiple searchers with different settings, but that's about it?). There are two main implied question in this:I would like to collect opinions on this, I think it could be an additional small breaking change to make in Lucene 10 if there is appeal for it.
The text was updated successfully, but these errors were encountered: