Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Logs Overview] Filter unsuitable tokens #197255

Open
weltenwort opened this issue Oct 22, 2024 · 1 comment · May be fixed by #197868
Open

[Logs Overview] Filter unsuitable tokens #197255

weltenwort opened this issue Oct 22, 2024 · 1 comment · May be fixed by #197868
Assignees
Labels
Team:obs-ux-logs Observability Logs User Experience Team

Comments

@weltenwort
Copy link
Member

📓 Summary

Log message commonly contain identifies and ids that are inherently (pseudo)random and contain characters that would be interpreted as punctuation in prose, such as UUIDs or addresses. In many cases the analyzer splits these into several small and diverse tokens that overwhelm the distance metric of the categorization aggregation. To reduce the likelihood of that we can filter tokens that look like strings consisting only of hexadecimal values before categorization and limit the number of tokens compared.

✔️ Acceptance criteria

  • The categorization_analyzer is configured such that the char_filter ignores hexadecimal tokens.
  • The categorization_analyzer is configured such that a limit filter sets a reasonable maximum token count.
@weltenwort weltenwort added the Team:obs-ux-logs Observability Logs User Experience Team label Oct 22, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-logs-team (Team:obs-ux-logs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:obs-ux-logs Observability Logs User Experience Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants