-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OneDiscover] Add EBT event to track field usage in ESQL #194909
Comments
Pinging @elastic/kibana-data-discovery (Team:DataDiscovery) |
@stratoula @vadimkibana Can either of you think of a good way to do this? Do we have a way using the AST to pull all field referenced in a query? Presumably for things like Also, would it be feasible to do directly inside the editor, or should we do it within Discover? |
Hmmm before we go on the technical implementation can we discuss it a bit more? Is it something you want to do for integrations @flash1293 ? I am afraid that tracking all fields that can be used by the numerous custom indices and thousands of ES|QL queries is something not performant. Can you elaborate more? How exactly do you think this to work? With Vadim's apis you can easily find the columns used (@vadimkibana keep me honest here). As this is going to be used in Discover it should be something that Discover app should do and not the editor. We can have helpers of course in the ES|QL utils. It is just not clear to me how exactly this is going to work. I think I need more details here. |
@stratoula Why do you think there would be a performance issue? The idea is to do this in the client every time the query is submitted (not during typing, and not on saved objects in the background job):
Parsing the query is something we do anyway and we already send an EBT event on every click the user does with a pretty big list of DOM selectors attached, this will be much cheaper. Doing this based on the columns of the table is not the same thing, as the idea is to capture the fields that matter to the user when writing the query (not the fields that are eventually displayed), so we can recommend the right fields in the right situation. Custom fields are not the primary concern here, I'm more interested in ECS fields. |
It would be very similar to this help for KQL - we do this already in the APM app:
|
My concern is mostly on the telemetry cluster and not on us. I think there is an ask to be more careful to the amount of things we send to the cluster.
I didnt suggest that. I suggested to use our new api which parses the query and can track the columns used. Columns === fields in ES|QL world (and our api). Not parsing the result, parsing the query. TL;DR this is not a problem What I don't understand is how we are going to use this info Joe. Ok you gather tons of fields from numerous indices that make no sense to us and you see that they are using these 1K fields. What are you going to do afterwards? As I ask above, is it only for our indices (integrations)? Because for random indices of our users, I cant see the benefit. Unless there is a way to get this info dynamically from the telemetry cluster and use it in Discover. This is where I want you to elaborate more. |
I see, makes sense - as mentioned we already send one event per click, I think one additional event per query submit will not make a dent in telemetry cluster load. Each given query won't use more than a couple of fields at once, so I don't think we get close to any usage limits here.
Yeah, that sounds good to me - if we do this in the Discover code it would be fine for me too.
This is mostly about getting info about ECS field usage - what "well known" fields are users typically using in their queries? Are they using 95% custom fields or is it mostly ECS fields? The answers to these questions can guide which fields we "recommend" to the user in certain situations. Integrations obviously use ECS fields heavily, but it's not limited to that - if you use our shippers like elatic-agent, even for custom data you will have ECS fields in your data. It's about that - can we curate and lean on well-known ECS fields or are we actually harming users by doing so? Right know we don't really know which fields users care about, this telemetry will tell us. I agree that one or the other custom field doesn't give us much info - as far as I'm concerned we can also mask them and replace all custom fields by a placeholder
I'm not planning anything like that, but it's an interesting idea. In general the information from this telemetry can help us deciding which features to build around suggesting users the right columns/fields |
Got it, ok it makes sense now. We already have the information if a field is an ECS one so we could track them either in Discover or the editor. It really depends on which app is the one that interests us (the editor is already used in many different apps already and is going to be used even more, each app has a different use case. So if we want to focus on Discover usage, I think it should be built in Discover. Otherwise in the editor) One question more. If I am not mistaken our telemetry events are mostly counters. So we are tracking the amount of clicks. Can we also send non numeric information too? An array of strings in this case. |
This PR #195200 will help you retrieve the fields / columns used in the query. (you can see the examples to understand better the output). After that you can compare them with the ECS fields and create your event. BUT I am not sure if you can actually send non numeric values as I say here #194909 (comment). This needs investigation |
@stratoula We are not just tracking the amount of clicks, we are tracking every click individually. Counters are separate from EBT (event based telemetry) - see here. You are referring to UI counters, I'm referring to EBT. For every click the user does in the UI, we send such a payload to the telemetry server:
|
Ok then, I have only worked with counters but if this is possible then the above helper will help you achieve what you want. |
Linking to similar work #193996 |
Absolutely @jughosta , I think extending |
Similar to #194907, the field names used in a submitted ESQL query should be tracked via telemetry.
This data can inform decisions on which fields to prioritize in field lists and field suggestions.
I'm not sure whether this should be done just for Discover or for all ESQL editors in the same way - the latter probably makes sense.
The text was updated successfully, but these errors were encountered: