[Logs Overview] Filter unsuitable tokens #197255

weltenwort · 2024-10-22T14:40:49Z

📓 Summary

Log message commonly contain identifies and ids that are inherently (pseudo)random and contain characters that would be interpreted as punctuation in prose, such as UUIDs or addresses. In many cases the analyzer splits these into several small and diverse tokens that overwhelm the distance metric of the categorization aggregation. To reduce the likelihood of that we can filter tokens that look like strings consisting only of hexadecimal values before categorization and limit the number of tokens compared.

🔗 related to: [Logs Overview] Enhanced logs component for solution UIs #190848

✔️ Acceptance criteria

The categorization_analyzer is configured such that the char_filter ignores hexadecimal tokens.
The categorization_analyzer is configured such that a limit filter sets a reasonable maximum token count.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2024-10-22T14:40:51Z

Pinging @elastic/obs-ux-logs-team (Team:obs-ux-logs)

weltenwort added the Team:obs-ux-logs Observability Logs User Experience Team label Oct 22, 2024

weltenwort self-assigned this Oct 24, 2024

weltenwort linked a pull request Oct 25, 2024 that will close this issue

[Logs Overview] Improve analyzer by filtering unsuitable tokens #197868

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Logs Overview] Filter unsuitable tokens #197255

[Logs Overview] Filter unsuitable tokens #197255

weltenwort commented Oct 22, 2024

elasticmachine commented Oct 22, 2024

[Logs Overview] Filter unsuitable tokens #197255

[Logs Overview] Filter unsuitable tokens #197255

Comments

weltenwort commented Oct 22, 2024

📓 Summary

✔️ Acceptance criteria

elasticmachine commented Oct 22, 2024