Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OneDiscover] Add EBT event to track field usage in ESQL #194909

Open
Tracked by #182073
flash1293 opened this issue Oct 4, 2024 · 13 comments
Open
Tracked by #182073

[OneDiscover] Add EBT event to track field usage in ESQL #194909

flash1293 opened this issue Oct 4, 2024 · 13 comments
Labels
impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. Project:OneDiscover Enrich Discover with contextual awareness / Merge with Logs Explorer Team:DataDiscovery Discover App Team (Document Explorer, Saved Search, Surrounding documents, Data, DataViews)

Comments

@flash1293
Copy link
Contributor

flash1293 commented Oct 4, 2024

Similar to #194907, the field names used in a submitted ESQL query should be tracked via telemetry.

This data can inform decisions on which fields to prioritize in field lists and field suggestions.

I'm not sure whether this should be done just for Discover or for all ESQL editors in the same way - the latter probably makes sense.

@botelastic botelastic bot added the needs-team Issues missing a team label label Oct 4, 2024
@flash1293 flash1293 added Team:DataDiscovery Discover App Team (Document Explorer, Saved Search, Surrounding documents, Data, DataViews) Project:OneDiscover Enrich Discover with contextual awareness / Merge with Logs Explorer labels Oct 4, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-data-discovery (Team:DataDiscovery)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Oct 4, 2024
@davismcphee
Copy link
Contributor

@stratoula @vadimkibana Can either of you think of a good way to do this? Do we have a way using the AST to pull all field referenced in a query? Presumably for things like KEEP, DROP, RENAME it would be relatively easy, but not sure about situations where fields are referenced in EVAL or similar commands.

Also, would it be feasible to do directly inside the editor, or should we do it within Discover?

@stratoula
Copy link
Contributor

stratoula commented Oct 6, 2024

Hmmm before we go on the technical implementation can we discuss it a bit more?

Is it something you want to do for integrations @flash1293 ? I am afraid that tracking all fields that can be used by the numerous custom indices and thousands of ES|QL queries is something not performant.

Can you elaborate more? How exactly do you think this to work?

With Vadim's apis you can easily find the columns used (@vadimkibana keep me honest here). As this is going to be used in Discover it should be something that Discover app should do and not the editor. We can have helpers of course in the ES|QL utils.

It is just not clear to me how exactly this is going to work. I think I need more details here.

@flash1293
Copy link
Contributor Author

flash1293 commented Oct 7, 2024

@stratoula Why do you think there would be a performance issue? The idea is to do this in the client every time the query is submitted (not during typing, and not on saved objects in the background job):

  • Parse the query
  • Extract referenced fields from the AST
  • Send them as a flat list as payload as an EBT event

Parsing the query is something we do anyway and we already send an EBT event on every click the user does with a pretty big list of DOM selectors attached, this will be much cheaper.

Doing this based on the columns of the table is not the same thing, as the idea is to capture the fields that matter to the user when writing the query (not the fields that are eventually displayed), so we can recommend the right fields in the right situation. Custom fields are not the primary concern here, I'm more interested in ECS fields.

@flash1293
Copy link
Contributor Author

It would be very similar to this help for KQL - we do this already in the APM app:

export function getKqlFieldNamesFromExpression(expression: string): string[] {

@stratoula
Copy link
Contributor

stratoula commented Oct 7, 2024

@stratoula Why do you think there would be a performance issue?

My concern is mostly on the telemetry cluster and not on us. I think there is an ask to be more careful to the amount of things we send to the cluster.

Doing this based on the columns of the table is not the same thing, as the idea is to capture the fields that matter to the user when writing the query (not the fields that are eventually displayed), so we can recommend the right fields in the right situation. Custom fields are not the primary concern here, I'm more interested in ECS fields.

I didnt suggest that. I suggested to use our new api which parses the query and can track the columns used. Columns === fields in ES|QL world (and our api). Not parsing the result, parsing the query. TL;DR this is not a problem

What I don't understand is how we are going to use this info Joe. Ok you gather tons of fields from numerous indices that make no sense to us and you see that they are using these 1K fields. What are you going to do afterwards? As I ask above, is it only for our indices (integrations)? Because for random indices of our users, I cant see the benefit. Unless there is a way to get this info dynamically from the telemetry cluster and use it in Discover. This is where I want you to elaborate more.

@flash1293
Copy link
Contributor Author

flash1293 commented Oct 7, 2024

My concern is mostly on the telemetry cluster and not on us. I think there is an ask to be more careful to the amount of things we send to the cluster.

I see, makes sense - as mentioned we already send one event per click, I think one additional event per query submit will not make a dent in telemetry cluster load. Each given query won't use more than a couple of fields at once, so I don't think we get close to any usage limits here.

I suggested to use our new api which parses the query and can track the columns used

Yeah, that sounds good to me - if we do this in the Discover code it would be fine for me too.

What I don't understand is how we are going to use this info Joe. Ok you gather tons of fields from numerous indices that make no sense to us and you see that they are using these 1K fields. What are you going to do afterwards? As I ask above, is it only for our indices (integrations)? Because for random indices of our users, I cant see the benefit

This is mostly about getting info about ECS field usage - what "well known" fields are users typically using in their queries? Are they using 95% custom fields or is it mostly ECS fields? The answers to these questions can guide which fields we "recommend" to the user in certain situations. Integrations obviously use ECS fields heavily, but it's not limited to that - if you use our shippers like elatic-agent, even for custom data you will have ECS fields in your data. It's about that - can we curate and lean on well-known ECS fields or are we actually harming users by doing so? Right know we don't really know which fields users care about, this telemetry will tell us. I agree that one or the other custom field doesn't give us much info - as far as I'm concerned we can also mask them and replace all custom fields by a placeholder _CUSTOM_FIELD, I'm mostly interested in how much those are used vs. ECS

Unless there is a way to get this info dynamically from the telemetry cluster and use it in Discover

I'm not planning anything like that, but it's an interesting idea. In general the information from this telemetry can help us deciding which features to build around suggesting users the right columns/fields

@stratoula
Copy link
Contributor

stratoula commented Oct 7, 2024

Got it, ok it makes sense now. We already have the information if a field is an ECS one so we could track them either in Discover or the editor. It really depends on which app is the one that interests us (the editor is already used in many different apps already and is going to be used even more, each app has a different use case. So if we want to focus on Discover usage, I think it should be built in Discover. Otherwise in the editor)

One question more. If I am not mistaken our telemetry events are mostly counters. So we are tracking the amount of clicks. Can we also send non numeric information too? An array of strings in this case.

@stratoula
Copy link
Contributor

This PR #195200 will help you retrieve the fields / columns used in the query. (you can see the examples to understand better the output). After that you can compare them with the ECS fields and create your event.

BUT I am not sure if you can actually send non numeric values as I say here #194909 (comment). This needs investigation

@flash1293
Copy link
Contributor Author

flash1293 commented Oct 7, 2024

@stratoula We are not just tracking the amount of clicks, we are tracking every click individually. Counters are separate from EBT (event based telemetry) - see here. You are referring to UI counters, I'm referring to EBT.

For every click the user does in the UI, we send such a payload to the telemetry server:

{
  "timestamp": "2024-10-07T09:34:13.322Z",
  "event_type": "click",
  "context": {
    "isDev": true,
    "isDistributable": false,
    "version": "9.0.0",
    "branch": "main",
    "buildNum": 9007199254740991,
    "buildSha": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
    "session_id": "48f2e1f9-4d56-47b3-9d6d-75d8b397007a",
    "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36",
    "preferred_language": "en-US",
    "preferred_languages": [
      "en-US",
      "en"
    ],
    "viewport_width": 1652,
    "viewport_height": 1223,
    "cluster_name": "elasticsearch",
    "cluster_uuid": "ctbi9-HdRaS89WynDTCeaQ",
    "cluster_version": "9.0.0-SNAPSHOT",
    "cluster_build_flavor": "default",
    "pageName": "application:dev_tools",
    "applicationId": "dev_tools",
    "page_title": "Console - Dev Tools - Elastic",
    "page_url": "/app/dev_tools#/console/history",
    "cloudId": "ftr_fake_cloud_id:aGVsbG8uY29tOjQ0MyRFUzEyM2FiYyRrYm4xMjNhYmM=",
    "deploymentId": "deploymentId",
    "labels": {},
    "discoverProfiles": [],
    "serviceInventoryViewType": "entity",
    "userId": "7ea90f1c43b0f59b62c57fab4f7e01016351aadff8c81e310e60953234108212",
    "isElasticCloudUser": false,
    "license_id": "c92f1dac-069f-4292-8915-e13c9f87042d",
    "license_status": "active",
    "license_type": "basic"
  },
  "properties": {
    "target": [
      "DIV",
      "id=consoleRoot",
      "class=consoleContainer",
      "DIV",
      "class=euiPanel euiPanel--plain euiSplitPanel css-zgsyqh-euiPanel-grow-m-plain-hasShadow-euiSplitPanelOuter-column",
      "DIV",
      "class=euiPanel euiPanel--transparent euiPanel--paddingMedium euiSplitPanel__inner consoleTabs css-9iwtfr-euiPanel-none-m-transparent-euiSplitPanelInner",
      "DIV",
      "class=euiFlexGroup css-dhjvmj-euiFlexGroup-s-flexStart-center-row",
      "DIV",
      "class=euiFlexItem css-9sbomz-euiFlexItem-grow-1",
      "DIV",
      "class=euiTabs css-188xqqo-euiTabs-s",
      "role=tablist",
      "DIV",
      "class=euiPopover euiTourAnchor css-14mlg6d-euiPopover-inline-block",
      "data-test-subj=historyTourStep",
      "BUTTON",
      "role=tab",
      "aria-selected=false",
      "type=button",
      "class=euiTab css-1mh7dkw-euiTab",
      "data-test-subj=consoleHistoryButton",
      "SPAN",
      "class=euiTab__content eui-textTruncate css-bzyfuv-euiTab__content-s"
    ]
  }
}

@stratoula
Copy link
Contributor

Ok then, I have only worked with counters but if this is possible then the above helper will help you achieve what you want.

@jughosta
Copy link
Contributor

jughosta commented Oct 7, 2024

Linking to similar work #193996

@flash1293
Copy link
Contributor Author

Absolutely @jughosta , I think extending discover_field_usage makes sense here!

@kertal kertal added the impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. label Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. Project:OneDiscover Enrich Discover with contextual awareness / Merge with Logs Explorer Team:DataDiscovery Discover App Team (Document Explorer, Saved Search, Surrounding documents, Data, DataViews)
Projects
None yet
Development

No branches or pull requests

6 participants