-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Search service] Search Collectors and Background Search Collections #53335
Comments
Pinging @elastic/kibana-app-arch (Team:AppArch) |
@ppisljar and I discussed another possible approach that relies more on the Elasticsearch cache and (possibly in a second phase) building services to execute the interpreter in a background job. When a user navigates to a dashboard, we would execute everything as normal. If a user then selects to run in the background, we would gather the state for the entire dashboard (including each embeddable). I understand this is currently something that is not available since embeddables are currently passed their saved object ID instead of the actual inputs, so this is something we'd have to build. It would probably look something like this: Container embeddable gathers its own state, then calls When re-visiting the URL, the dashboard would pass down each embeddable's We would still need to pass some sort of This may work for a first phase approach, but you still run into the edge case for expressions that contain multiple requests to Elasticsearch, with the latter based on the previous. If we want to support these types of functions, we would need to build a service that truly executes the entire expression in the background. The output of such would not be stored anywhere, it would just need to ensure that it continues execution so that expressions with multiple requests to ES wouldn't stop with the first request. One downside to this approach is that it relies on the Elasticsearch caching, which (in the current plan) would be cleared out each 4 days. We have no idea of knowing when the cached results "expire" (or are cleared). Also, what if one request executes quickly, and another request takes 4 days to complete? This is an edge case we need to consider. What are your thoughts about the feasibility of this approach @stacey-gammon? |
If I understand this correctly:
The only thing Kibana has to do is decide when to send It sounds a lot simpler from the Kibana side, but it also seems like we would have a lot less control. I have a feeling this isn't going to work out well. For example, how would we be able to calculate progress and ability to cancel searches in this view: Storing all state by value instead of by referenceAgree this is the ideal long term solution. I suspect this will be a big feature in and of itself but @ppisljar thinks it might end up being easier to accomplish. If so, awesome. Search collectorsWe still need search collectors in order to get progress and cancellation. So I think the only question here is, do we add logic in Kibana to hash the request and send the search id to es instead of the full request. Pros of not doing this mapping:
Possible shorter term winsSynced with @ppisljar today to go over this and above comment. We think the best plan forward is to start with the search collectors, which we'll need anyway. But once we have those incorporated it would be worthwhile to see if we can get away with an easy win by exposing:
Then, we don't need any new saved object types, no management screen. How to get fast searches? Click "run past timeout" and check "Use cached searches". If cached searches are available in ES, they will be used automatically. This is way easier to implement than the background saved object collections. Nice idea @ppisljar! |
search service is not passed to the interpreter function, rather interpreter function is given reference to the search service at the time of registration to the registry. this doesn't affect much, instead search function being one of the handlers, we would pass in an instance of searchCollector (similar to what we do with inspectorAdapters) |
If we did that though, we couldn't scope it and it would be a global singleton search collector. I think we need to allow the embeddables to pass it in via |
@stacey-gammon thats not the case (the part about not being able to scope it), we would be passing the instance of search collector to interpreter, which means we can pass a different instance in each time we execute an expression (if required), so the app is completely in control. So at the time of registration, we give esaggs a reference to the search service, but at the time of execution we pass an instance of searchCollector in. (how we create the search collecton can stay out of this discussion as its irrelevant) I would avoid using global search collectors for the reasons you described to me earlier. If somebody would really want to do it, he would need to make sure he passes the same instance of search collector to everything making the requests. |
Okay, gotcha, I think we are actually on the same exact page then, I just read your comment incorrectly! |
Closing in favor of #61738. |
Background
We want to improve our support for long running queries and users that don't want to sit around and wait for a dashboard to finish loading when it could take hours/days.
Goals
Support:
Additional features, not as high priority:
Proposal
Search Collectors
collect and manage a group of searches.Implementation details
There will be two varieties of search collectors: Basic and Advanced. The Advanced search collector will expose some methods like:
The search collector can also handle showing/closing a toast the exposes cancellation, over-all progress, and the ability to send to background, though we may want to de-couple that functionality and put it elsewhere (like part of the dashboard embeddable).
Regardless, where ever we expose the "send to background" functionality, the user of the service will have to specify the url and the state that should be saved with the search collection as part of the
BackgroundSearchCollection
saved object. For the embeddable specific implementation, dashboards will not use the dashboard app URL, but a special URL that will show all background searches coming from Embeddables.Dashboard app URL vs generic Embeddable viewer URL
While the generic viewer URL might be a simpler implementation in the short run, we've decided to try to make this work with application URLs, because we decided that makes for a better, more consistent UX experience.
In order to support the dashboard app URL:
Cons of doing embeddable viewer URL:
Pros of Embeddable viewer URL:
Decision: We are going to move forward with app specific URLs, not implementing the embeddable viewer URL if we can manage it without too much extra work, as it was decided this will create a better and more consistent UX. We can always revisit the idea of adding an embeddable viewer URL if we encounter too many issues.
Nested search collectors
If we expose the ability to create a child collector from a parent, we can have all of this functionality at the dashboard level as well as at the per panel level. This means we can use the same mechanism to cancel/show progress/send to background for each panel in a dashboard.
Exposing to the interpreter
We need to expose this search function to expression fns anyway so all the expression fns can use our new Search Api services, so the expression would be something like:
Search interceptor logic
Would be something like:
Background search collection saved objects
Each background search collection object will have:
Graceful fallback
If an embeddable doesn't use the search service, those searches will just be executed normally when the background search URL is opened. This is actually somewhat problematic because the searches will be out of sync if we have relative time ranges. The cached searches "now to now - 1y" may end up showing up "now - 4days to now - 1y and 4 days" but embeddables that aren't using our search collectors will send live queries.
I suspect this is something we can hack around, or just live with. Also, if we switch to absolute time ranges, which will probably be better for the user anyway, and more accurate, this shouldn't really be a problem. The panels will be showing the same information.
Corner cases and considerations
When to clear the search cache?
Consider the scenario:
bgSearchId:123
in it.Similar:
Easiest technical solution: Dashboard embeddable can detect any update to it's input and clear the cache, however, this may prove to be slightly difficult because of async updates to dashboard input. For example, when the app first loads there are no filters, then filters are later added async by the filter manager. If this is all part of the page loading process, we want the searches to stay the background versions. But if the user interacts with the page and does something, we want them to be new searches. We can possibly hack around this in the dashboard app, but every solution that wants to use this system needs to consider this
What happens after you "send to background"?
Ideally:
Is your current view using the search ids that are saved in the background search collection saved object you just created, or are they using different ids?
If the current view is editable, and using the same search ids, we have to be careful not to accidentally abort them, for instance if the user deletes a panel. What happens if a user adds a new panel and then wants to "send to background" again - you don't want two search collections using that have the same search ids
What if you were viewing a relative time zone, will it switch you to absolute?
What happens if you hard refresh the page at this point?
Two panels that share the same exact request, one gets deleted
For instance, if we have two panels that share the same request, and one panel gets deleted from the dashboard, this will cancel the other panel's search. We may want to have a count on each search so we don't do this as long as there is >1 user of a search.
Easiest technical solution: Accept this as a known issue. I don't expect it would be a huge issue in practice. Refreshing the page should cause new searches to go out so there is a workaround.
Serialized searches coming through the collector
Any expression that contains two search functions will hit this (though it's not many). For instance, a user clicks send to background while the first search is running. The second search will not have a saved background id for it and it will be re-executed every time the background search collection is loaded back up.
This happens today in:
Easiest technical solution: Just not support this use case initially. Secondary searches will not initially be cached.
Searches that have already completed
We need to also test the situation where some searches have already completed.
Expose filters/time range from the dashboard
Since the filter bar and time picker are not yet part of the dashboard embeddable, we'll need to make sure those are shown to the user in the embeddable viewer somehow, or bring that bar inside.
User leaves page before a search returns the first response with the id
We can't avoid showing the "send to background" button until every search has an id because searches can come in at any time, we don't want this button flickering. How do we handle this situation? Similar to the situation with an expression with serialized searches.
Relative vs absolute time ranges
We should convert relative to absolute time ranges or it will be confusing for the user to see "last 5 years" or something, but it's not actually "now", now is some time in the past.
This is something solution teams will have to manually handle this.
Expiration of searches
The current plan is that searches will expire after some amount of time after they complete. The problem with how that will work with this search collector is that we theoretically could have one search in the collector that expires before another search ever even completes (albiet this is probably unlikely, theoretically possible).
Another thing we want to answer is how to determine if a search is expired and communicate that to the user. Do we auto delete these, or make the user delete the objects when they are expired?
** Should the Embeddable that is backed off of a background search collection be editable?**
Ideally yes.
Embeddables by reference vs by value
Situation is:
bgSearchId=xyz
.Similar situation, maybe more awkward:
bgSearchId=xyz
.Note that the second situation will not happen with the embeddable viewer version because the embeddable input is saved as a snapshot with the URL state, BUT, the first situation could still be encountered because children embeddable input would continue to be a reference to the saved visualization, not a visualization "by value".
We might be able to partially solve this if we put the dashboard input back in the URL so it overrides any changes made on the saved dashboard.
As for the first scenario, there is not much we can do without a major overhaul (although some of this work is already underway here: #51318
URL state Migrations
We need to be careful that teams that register their own URLs understand they will have to migrate the state in the saved objects if any of the URLs + state will no longer work. It's unclear if saved object migrations even support multiple plugins adding migrations for the same saved object type, but one workaround, while not the ideal scenario, could be to have teams write the migration in the background search plugin.
Easiest technical solution: Move forward knowing this is a risk for bugs, communicate with teams they should add bwc tests for URLs.
Best technical solution: Probably something like #25247 - store URLs as a common saved object type and link to that. Only one saved object to migrate, any team that uses the URL service knows they will have to migrate their saved objects.
Roadmap
The text was updated successfully, but these errors were encountered: