[OneDiscover] Add EBT event to track field usage in ESQL #194909

flash1293 · 2024-10-04T07:37:46Z

Similar to #194907, the field names used in a submitted ESQL query should be tracked via telemetry.

This data can inform decisions on which fields to prioritize in field lists and field suggestions.

I'm not sure whether this should be done just for Discover or for all ESQL editors in the same way - the latter probably makes sense.

elasticmachine · 2024-10-04T07:38:45Z

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

davismcphee · 2024-10-04T16:30:54Z

@stratoula @vadimkibana Can either of you think of a good way to do this? Do we have a way using the AST to pull all field referenced in a query? Presumably for things like KEEP, DROP, RENAME it would be relatively easy, but not sure about situations where fields are referenced in EVAL or similar commands.

Also, would it be feasible to do directly inside the editor, or should we do it within Discover?

stratoula · 2024-10-06T08:47:26Z

Hmmm before we go on the technical implementation can we discuss it a bit more?

Is it something you want to do for integrations @flash1293 ? I am afraid that tracking all fields that can be used by the numerous custom indices and thousands of ES|QL queries is something not performant.

Can you elaborate more? How exactly do you think this to work?

With Vadim's apis you can easily find the columns used (@vadimkibana keep me honest here). As this is going to be used in Discover it should be something that Discover app should do and not the editor. We can have helpers of course in the ES|QL utils.

It is just not clear to me how exactly this is going to work. I think I need more details here.

flash1293 · 2024-10-07T07:18:36Z

@stratoula Why do you think there would be a performance issue? The idea is to do this in the client every time the query is submitted (not during typing, and not on saved objects in the background job):

Parse the query
Extract referenced fields from the AST
Send them as a flat list as payload as an EBT event

Parsing the query is something we do anyway and we already send an EBT event on every click the user does with a pretty big list of DOM selectors attached, this will be much cheaper.

Doing this based on the columns of the table is not the same thing, as the idea is to capture the fields that matter to the user when writing the query (not the fields that are eventually displayed), so we can recommend the right fields in the right situation. Custom fields are not the primary concern here, I'm more interested in ECS fields.

flash1293 · 2024-10-07T07:19:42Z

It would be very similar to this help for KQL - we do this already in the APM app:

kibana/packages/kbn-es-query/src/kuery/utils/get_kql_fields.ts

Line 14 in b628770

export function getKqlFieldNamesFromExpression(expression: string): string[] {

stratoula · 2024-10-07T07:24:03Z

@stratoula Why do you think there would be a performance issue?

My concern is mostly on the telemetry cluster and not on us. I think there is an ask to be more careful to the amount of things we send to the cluster.

Doing this based on the columns of the table is not the same thing, as the idea is to capture the fields that matter to the user when writing the query (not the fields that are eventually displayed), so we can recommend the right fields in the right situation. Custom fields are not the primary concern here, I'm more interested in ECS fields.

I didnt suggest that. I suggested to use our new api which parses the query and can track the columns used. Columns === fields in ES|QL world (and our api). Not parsing the result, parsing the query. TL;DR this is not a problem

What I don't understand is how we are going to use this info Joe. Ok you gather tons of fields from numerous indices that make no sense to us and you see that they are using these 1K fields. What are you going to do afterwards? As I ask above, is it only for our indices (integrations)? Because for random indices of our users, I cant see the benefit. Unless there is a way to get this info dynamically from the telemetry cluster and use it in Discover. This is where I want you to elaborate more.

flash1293 · 2024-10-07T07:54:49Z

My concern is mostly on the telemetry cluster and not on us. I think there is an ask to be more careful to the amount of things we send to the cluster.

I see, makes sense - as mentioned we already send one event per click, I think one additional event per query submit will not make a dent in telemetry cluster load. Each given query won't use more than a couple of fields at once, so I don't think we get close to any usage limits here.

I suggested to use our new api which parses the query and can track the columns used

Yeah, that sounds good to me - if we do this in the Discover code it would be fine for me too.

What I don't understand is how we are going to use this info Joe. Ok you gather tons of fields from numerous indices that make no sense to us and you see that they are using these 1K fields. What are you going to do afterwards? As I ask above, is it only for our indices (integrations)? Because for random indices of our users, I cant see the benefit

This is mostly about getting info about ECS field usage - what "well known" fields are users typically using in their queries? Are they using 95% custom fields or is it mostly ECS fields? The answers to these questions can guide which fields we "recommend" to the user in certain situations. Integrations obviously use ECS fields heavily, but it's not limited to that - if you use our shippers like elatic-agent, even for custom data you will have ECS fields in your data. It's about that - can we curate and lean on well-known ECS fields or are we actually harming users by doing so? Right know we don't really know which fields users care about, this telemetry will tell us. I agree that one or the other custom field doesn't give us much info - as far as I'm concerned we can also mask them and replace all custom fields by a placeholder _CUSTOM_FIELD, I'm mostly interested in how much those are used vs. ECS

Unless there is a way to get this info dynamically from the telemetry cluster and use it in Discover

I'm not planning anything like that, but it's an interesting idea. In general the information from this telemetry can help us deciding which features to build around suggesting users the right columns/fields

stratoula · 2024-10-07T08:03:15Z

Got it, ok it makes sense now. We already have the information if a field is an ECS one so we could track them either in Discover or the editor. It really depends on which app is the one that interests us (the editor is already used in many different apps already and is going to be used even more, each app has a different use case. So if we want to focus on Discover usage, I think it should be built in Discover. Otherwise in the editor)

One question more. If I am not mistaken our telemetry events are mostly counters. So we are tracking the amount of clicks. Can we also send non numeric information too? An array of strings in this case.

stratoula · 2024-10-07T09:34:05Z

This PR #195200 will help you retrieve the fields / columns used in the query. (you can see the examples to understand better the output). After that you can compare them with the ECS fields and create your event.

BUT I am not sure if you can actually send non numeric values as I say here #194909 (comment). This needs investigation

flash1293 · 2024-10-07T09:36:42Z

@stratoula We are not just tracking the amount of clicks, we are tracking every click individually. Counters are separate from EBT (event based telemetry) - see here. You are referring to UI counters, I'm referring to EBT.

For every click the user does in the UI, we send such a payload to the telemetry server:

{
  "timestamp": "2024-10-07T09:34:13.322Z",
  "event_type": "click",
  "context": {
    "isDev": true,
    "isDistributable": false,
    "version": "9.0.0",
    "branch": "main",
    "buildNum": 9007199254740991,
    "buildSha": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    "session_id": "48f2e1f9-4d56-47b3-9d6d-75d8b397007a",
    "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36",
    "preferred_language": "en-US",
    "preferred_languages": [
      "en-US",
      "en"
    ],
    "viewport_width": 1652,
    "viewport_height": 1223,
    "cluster_name": "elasticsearch",
    "cluster_uuid": "ctbi9-HdRaS89WynDTCeaQ",
    "cluster_version": "9.0.0-SNAPSHOT",
    "cluster_build_flavor": "default",
    "pageName": "application:dev_tools",
    "applicationId": "dev_tools",
    "page_title": "Console - Dev Tools - Elastic",
    "page_url": "/app/dev_tools#/console/history",
    "cloudId": "ftr_fake_cloud_id:aGVsbG8uY29tOjQ0MyRFUzEyM2FiYyRrYm4xMjNhYmM=",
    "deploymentId": "deploymentId",
    "labels": {},
    "discoverProfiles": [],
    "serviceInventoryViewType": "entity",
    "userId": "7ea90f1c43b0f59b62c57fab4f7e01016351aadff8c81e310e60953234108212",
    "isElasticCloudUser": false,
    "license_id": "c92f1dac-069f-4292-8915-e13c9f87042d",
    "license_status": "active",
    "license_type": "basic"
  },
  "properties": {
    "target": [
      "DIV",
      "id=consoleRoot",
      "class=consoleContainer",
      "DIV",
      "class=euiPanel euiPanel--plain euiSplitPanel css-zgsyqh-euiPanel-grow-m-plain-hasShadow-euiSplitPanelOuter-column",
      "DIV",
      "class=euiPanel euiPanel--transparent euiPanel--paddingMedium euiSplitPanel__inner consoleTabs css-9iwtfr-euiPanel-none-m-transparent-euiSplitPanelInner",
      "DIV",
      "class=euiFlexGroup css-dhjvmj-euiFlexGroup-s-flexStart-center-row",
      "DIV",
      "class=euiFlexItem css-9sbomz-euiFlexItem-grow-1",
      "DIV",
      "class=euiTabs css-188xqqo-euiTabs-s",
      "role=tablist",
      "DIV",
      "class=euiPopover euiTourAnchor css-14mlg6d-euiPopover-inline-block",
      "data-test-subj=historyTourStep",
      "BUTTON",
      "role=tab",
      "aria-selected=false",
      "type=button",
      "class=euiTab css-1mh7dkw-euiTab",
      "data-test-subj=consoleHistoryButton",
      "SPAN",
      "class=euiTab__content eui-textTruncate css-bzyfuv-euiTab__content-s"
    ]
  }
}

stratoula · 2024-10-07T09:38:54Z

Ok then, I have only worked with counters but if this is possible then the above helper will help you achieve what you want.

jughosta · 2024-10-07T10:54:12Z

Linking to similar work #193996

flash1293 · 2024-10-07T11:01:32Z

Absolutely @jughosta , I think extending discover_field_usage makes sense here!

flash1293 mentioned this issue Oct 4, 2024

[OneDiscover] Telemetry tasks #182073

Open

botelastic bot added the needs-team Issues missing a team label label Oct 4, 2024

flash1293 added Team:DataDiscovery Discover App Team (Document Explorer, Saved Search, Surrounding documents, Data, DataViews) Project:OneDiscover Enrich Discover with contextual awareness / Merge with Logs Explorer labels Oct 4, 2024

botelastic bot removed the needs-team Issues missing a team label label Oct 4, 2024

kertal mentioned this issue Oct 9, 2024

[OneDiscover] Add EBT event to track field usage in ESQL #194908

Closed

kertal added the impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. label Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OneDiscover] Add EBT event to track field usage in ESQL #194909

[OneDiscover] Add EBT event to track field usage in ESQL #194909

flash1293 commented Oct 4, 2024 •

edited

Loading

elasticmachine commented Oct 4, 2024

davismcphee commented Oct 4, 2024

stratoula commented Oct 6, 2024 •

edited

Loading

flash1293 commented Oct 7, 2024 •

edited

Loading

flash1293 commented Oct 7, 2024

stratoula commented Oct 7, 2024 •

edited

Loading

flash1293 commented Oct 7, 2024 •

edited

Loading

stratoula commented Oct 7, 2024 •

edited

Loading

stratoula commented Oct 7, 2024

flash1293 commented Oct 7, 2024 •

edited

Loading

stratoula commented Oct 7, 2024

jughosta commented Oct 7, 2024

flash1293 commented Oct 7, 2024

[OneDiscover] Add EBT event to track field usage in ESQL #194909

[OneDiscover] Add EBT event to track field usage in ESQL #194909

Comments

flash1293 commented Oct 4, 2024 • edited Loading

elasticmachine commented Oct 4, 2024

davismcphee commented Oct 4, 2024

stratoula commented Oct 6, 2024 • edited Loading

flash1293 commented Oct 7, 2024 • edited Loading

flash1293 commented Oct 7, 2024

stratoula commented Oct 7, 2024 • edited Loading

flash1293 commented Oct 7, 2024 • edited Loading

stratoula commented Oct 7, 2024 • edited Loading

stratoula commented Oct 7, 2024

flash1293 commented Oct 7, 2024 • edited Loading

stratoula commented Oct 7, 2024

jughosta commented Oct 7, 2024

flash1293 commented Oct 7, 2024

flash1293 commented Oct 4, 2024 •

edited

Loading

stratoula commented Oct 6, 2024 •

edited

Loading

flash1293 commented Oct 7, 2024 •

edited

Loading

stratoula commented Oct 7, 2024 •

edited

Loading

flash1293 commented Oct 7, 2024 •

edited

Loading

stratoula commented Oct 7, 2024 •

edited

Loading

flash1293 commented Oct 7, 2024 •

edited

Loading