Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Elastic Agent] Add support for parsers configuration for Hints Autodiscover #3102

Closed
Tracked by #3063
gizas opened this issue Jul 19, 2023 · 10 comments
Closed
Tracked by #3063
Assignees
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team

Comments

@gizas
Copy link
Contributor

gizas commented Jul 19, 2023

Describe the enhancement:

We need hint support for json and multiline parsers from annotations that defined from the users

Describe a specific use case for the enhancement or feature:

Agent configuration:

      providers:
        - type: kubernetes
          node: ${NODE_NAME}
          hints.enabled: true
          hints.default_config:
            type: filestream
            prospector.scanner.symlinks: true
            id: filestream-kubernetes-pod-${data.kubernetes.container.id}
            take_over: true
            paths:
            - /var/log/containers/*-${data.kubernetes.container.id}.log
            parsers:
            - container: ~

See above that type: filestream

The user will define in the pod:

annotations:
        co.elastic.hints/json.add_error_key: "true"
        co.elastic.hints/json.expand_keys: "true"
        co.elastic.hints/json.ignore_decoding_error: "true"
        co.elastic.hints/json.keys_under_root: "true"
        co.elastic.hints/json.message_key: "message"

And those will produce the following block:

parsers:
   - ndjson: 
         ignore_decoding_error: "true"
         expand_keys: "true"
         keys_under_root: "true"
         message_key: "message"

What is the definition of done?

@gizas
Copy link
Contributor Author

gizas commented Aug 3, 2023

@cmacknz , @rdner (because you worked in filestream input in the past) I cc you here in order to help us investigate how we can update the parsers configuration from a given user's input.

In beats we had the co.elastic.logs/json see https://www.elastic.co/guide/en/beats/filebeat/current/configuration-autodiscover-hints.html#_co_elastic_logsjson and we could configure the json log input.

But now in agent and filestream input we would like to offer the same flexibility to our users and through co.elastic.hints/json to configure the respective ndjson parser of filstream block.

For processors (see eg. https://github.com/elastic/elastic-agent/pull/3107/files#diff-ed933ad9cf09eb9ca797c5c89b5e8991624cbe5301d88f996d9bd947f4e3ea16R455) we have the relative map exposed and we can configure/ update the new list. I am struggling to find a way to see how I can edit parsers config. Any help here really appreciated

@cmacknz
Copy link
Member

cmacknz commented Aug 3, 2023

This is more of an agent composable provider question than a filestream question.

The parsers section goes at the root of the input configuration just like processors, so you can probably just clone or reuse the code that adds processors to add parsers. It should work exactly the same way, the name and options are just different. You will have to modify the code you linked.

@gizas
Copy link
Contributor Author

gizas commented Aug 4, 2023

@cmacknz thanks for looking into this. I was having a look into this today and I see that:
Initially (going from bottom to up), you need to modify the AddOrUpdate function and this leads me then to transpiler . So I guess you mean to go up all this way :)

Wondering if there is another way to do so? But still can not find something

@gizas
Copy link
Contributor Author

gizas commented Aug 4, 2023

@cmacknz see my f7503fb initial effort. At least elastic image builts, but parsers are not passed at the moment to composable provider. If you can spot anything missing let me know.

@gizas
Copy link
Contributor Author

gizas commented Aug 22, 2023

@cmacknz one step at a time, I fixed some errors and now at least the functionality now is not broken, autodiscovery works, but the parsers not added :)

  1. The autodisovery reads the parsers that I provide with following annotations:
annotations:
        co.elastic.hints/parsers.ndjson.add_error_key: 'false'
        co.elastic.hints/parsers.ndjson.message_key: "pas"
  1. Then comm.AddOrUpdate is called with correct data:
    f7503fb#diff-ed933ad9cf09eb9ca797c5c89b5e8991624cbe5301d88f996d9bd947f4e3ea16L504

Screenshot 2023-08-22 at 3 07 04 PM

  1. This leads to /internal/pkg/compsable/controller.go is called and the dynamicProvider object is passed:
    https://github.com/elastic/elastic-agent/blob/processor_parsers/internal/pkg/composable/controller.go#L397

  2. This is the pod that is emmited finally: https://github.com/elastic/elastic-agent/blob/processor_parsers/internal/pkg/composable/providers/kubernetes/pod.go#L247
    Screenshot 2023-08-22 at 5 26 28 PM

  3. Then this leads to https://github.com/elastic/elastic-agent-autodiscover/blob/main/kubernetes/watcher.go but from this point on I am loosing track.

Summary is that the processors are being added from autodiscovery but not parsers.
Full output for reference

- data_stream.namespace: default
  id: hints-filestream-container-logs-62d546c0055ce651842f8ec7b3911439555a66aaf6b238f87e765917a90cef66-kubernetes-e36973fb-debd-4252-a402-f7b185ce286e.nginx
  name: hints-filestream-container-logs
  original_id: hints-filestream-container-logs-62d546c0055ce651842f8ec7b3911439555a66aaf6b238f87e765917a90cef66
  processors:
  - add_fields:
      fields:
        id: 62d546c0055ce651842f8ec7b3911439555a66aaf6b238f87e765917a90cef66
        image:
          name: nginx
        runtime: containerd
      target: container
  - add_fields:
      fields:
        container:
          name: nginx
        ....
  - add_fields:
      fields:
        cluster:
          name: kind
          url: kind-control-plane:6443
      target: orchestrator
  - add_fields:
      fields:
        name: myproject
      target: project
  - rename:
      fail_on_error: "false"
      fields:
        "0":
          from: message
        "1":
          to: pasole
  streams:
  - data_stream:
      dataset: kubernetes.container_logs
      type: logs
    parsers:
    - container:
        format: auto
        stream: all
    paths:
    - /var/log/containers/*62d546c0055ce651842f8ec7b3911439555a66aaf6b238f87e765917a90cef66.log
    prospector:
      scanner:
        symlinks: true
  type: filestream
  use_output: default
outputs:
  default:
    allow_older_versions: true
    hosts:
    - https://elasticsearch:9200
    password: changeme
    ssl:
      verification_mode: none
    type: elasticsearch
    username: elastic
providers:
  kubernetes:
    hints:
      default_container_logs: false
      enabled: true
    node: kind-control-plane
    scope: node

cc @ChrsMark for reference

@cmacknz
Copy link
Member

cmacknz commented Aug 22, 2023

I haven't spent much time in this part of the agent code unfortunately, so I don't see anything obvious.

@ChrsMark
Copy link
Member

Hey folks, coming late to this discussion. A couple of thoughts:

  1. Do we have any examples where parsers are used in other than the filstream input? I'm trying to understand [Elastic Agent] Add support for parsers configuration for Hints Autodiscover #3102 (comment) mostly, and to my knowledge parsers are natively supported in filestream input. Do I miss sth here that parsers are supported in other inputs as well?

ref: filestream inputs
Based on this I see it as a different case compared to the processors, and hence it doesn't make sense to pass parsers for all kind of inputs (metrics etc) maybe?

  1. In general, I'm not sure if we really want to change the DynamicProviderComm interface at f7503fb#diff-e84202e025c0b38310aa13257a34c1c4304473500c894e45eabbfe8567254fd9R25 and additionally pass the parsers information there.

For processors (in hints) it was different since the processors were already part of the DynamicProviderComm interface and hence we could take advantage of it. That helped to tackle #735. However, I would avoid extending DynamicProviderComm with additional variables because this would take us down an unpredictable path.

In this, I would suggest taking a step back and discuss what we want to achieve here. If I'm not mistaken we just want to support parsers' configurations through hints. I also see how the parsers' case is similar to the processor's one since it hits the same issue: #735

So how about just supporting parsers only in hints templates that are based on the filestream input? This means that parsers would be defined as placeholders within the templates like all other hints and would be populated in the hintsMapping.
I know it's a compromise but I'm not sure how we could achieve this when we hit #735.

Happy to brainstorm more on this if needed.

@gizas
Copy link
Contributor Author

gizas commented Aug 24, 2023

Hey, I wanted to give it a try to see if it can work tbh. Until now I have not seen any other example of parsers (@cmacknz can correct me on this)

I have tested the decode_json_fields processor with the filestream and hints autodicovery. I have added a doc section here elastic/ingest-docs@2c72477

So I guess for now we can park the parsers support with hints until #735 is prioritised. @cmacknz do you have any plans for this?

I think @ChrsMark the #3161 is getting bigger priority now in order to allign autodiscovery with beats.

Happy to brainstorm more on this if needed.

I will try to have a look next weeks as well but any ideas are more than welcome

@cmacknz
Copy link
Member

cmacknz commented Aug 24, 2023

#735 isn't on our roadmap at all right now.

Filestream is also the only input that uses parsers today AFAIK.

@MichaelKatsoulis
Copy link
Contributor

This issue will be closed for now as json decoding can be performed with decode_json_fields processor.
So support for parsers is not mandatory. It can be reviewed when/if #735 gets prioritised.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team
Projects
None yet
Development

No branches or pull requests

4 participants