Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Automatic Import] Reduce the number of categorization errors #198326

Open
ilyannn opened this issue Oct 30, 2024 · 0 comments
Open

[Automatic Import] Reduce the number of categorization errors #198326

ilyannn opened this issue Oct 30, 2024 · 0 comments
Labels
bug Fixes for quality problems that affect the customer experience Feature:AutomaticImport Team:Security-Scalability Team label for Security Integrations Scalability Team

Comments

@ilyannn
Copy link
Contributor

ilyannn commented Oct 30, 2024

Context

Integrations like Postgres are quite complex and sometimes the categorization process does not complete within the allowed 2 minutes.

I think we can focus on improving the categorization process to reduce the number of failed generations.

We currently ask the LLM to generate the ingest pipeline directly. This produces incorrect combinations of category + type, possibly because the LLM is not smart enough to notice that (the ingest pipeline is non-local):

[
  {
    "field": "event.category",
    "value": [
      "database"
    ]
  },
  {
    "field": "event.type",
    "value": [
      "info"
    ]
  },
  {
    "field": "event.type",
    "value": [
      "start"
    ],
    "if": "ctx.message?.contains('starting PostgreSQL')"
  },
...

Suggestion

We can instead ask it to produce a list of conditions and pairs of category + type, so that those two values are located nearby:

- if: "ctx.message?.contains('starting PostgreSQL')"
- classify: database + start

would be much easier for the LLM to notice.

@ilyannn ilyannn added Team:Security-Scalability Team label for Security Integrations Scalability Team Feature:AutomaticImport bug Fixes for quality problems that affect the customer experience labels Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:AutomaticImport Team:Security-Scalability Team label for Security Integrations Scalability Team
Projects
None yet
Development

No branches or pull requests

1 participant