Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluentd stops sending logs when log_format is set to 'text' #85

Open
danhale-git opened this issue Jun 5, 2023 · 2 comments
Open

Fluentd stops sending logs when log_format is set to 'text' #85

danhale-git opened this issue Jun 5, 2023 · 2 comments

Comments

@danhale-git
Copy link

danhale-git commented Jun 5, 2023

Edit: Found the issue: I had failed to set log_key after the key changed from the default message to a non default value.

I'm going to leave this open because I think there are some improvements which could have clarified this issue. Unfortunately I have neither the time nor the Ruby experience to open a PR right now.

  • output.rb (and perhaps the plugin as a whole) seems to lack trace logs. These would have been helpful in confirming which parts of the write() function were executing, which is helpful in understanding where the issue occurs.
  • The log_key value did not exist and I had configured log_format=text which means the expected result should be that no logs are ever sent. I think this is an appropriate situation in which to issue a log, maybe even with level warn informing the user that, for the current chunk, their configuration is useless.

I appreciate my suggestions may not be valid for this code. Feel free to close this as my issue related to misconfiguration and has been resolved.


I'm using the kube-logging operator: https://kube-logging.dev/docs/configuration/plugins/outputs/sumologic/
This issue arose as we updated from version 3.17 to 4.20 of the operator.
There are various sumologic related changes in that release, visible by searching in on the page of this very large diff.
kube-logging/logging-operator@release-3.17...4.2.0

However I've come here because the behaviour I'm getting from fluentd based on a single change to the Sumologic output configuration seems unexpected. I hope to get advice on how to move forward with this issue as it seems like a silent failure.

Feel free to close this issue or advise accordingly, I have not confirmed a bug although it does seem like there might be one.

The Sumologic Output is working as expected with log_format: json however if I set log_format: text fluentd stops sending logs.

<source>
  @type forward
  @id main_forward
  bind 0.0.0.0
  port 24240
</source>
<match **>
  @type label_router
  @id main
  metrics true
  <route>
    @label @c1157f02c8c13fd3ea66f8419567c357
    metrics_labels {"id":"flow:mynamespace:my-sumo-flow"}
    <match>
      labels my-sumo-label:enabled
      namespaces mynamespace
      negate false
    </match>
  </route>
  ... ...
</match>
<label @c1157f02c8c13fd3ea66f8419567c357>
  <match kubernetes.**>
    @type tag_normaliser
    @id flow:mynamespace:my-sumo-flow:0
    format ${namespace_name}.${labels.environment}.${pod_name}.${container_name}
  </match>
  <match **>
    @type sumologic
    @id flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output
    endpoint COLLECTOR_URL_REMOVED
    log_format json
    source_name my-sumo-source
    <buffer tag,time>
      @type file
      chunk_limit_size 32m
      flush_interval 60s
      flush_mode interval
      path /buffers/flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output.*.buffer
      retry_forever true
      timekey 10m
      timekey_wait 1m
      total_limit_size 1024m
    </buffer>
  </match>
</label>
... ...

With log_format: json I see fluend logging sends and I see logs in Sumologic.

$ tail -f -n 100000 /fluentd/log/out | grep "mynamespace:my-sumo"
2023-06-05 14:53:43 +0000 [debug]: #0 [flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output] Sending 2; logs records with source category '', source host '', source name 'my-sumo-source', chunk #5fd630dfe92e8520c65b0b2aedb3524c, try 0, batch 0
2023-06-05 14:58:21 +0000 [debug]: #0 [flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output] Created new chunk chunk_id="5fd6322428e95cb3b7fbd07ad24428b0" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1685976600, tag="mynamespace.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-05 14:59:22 +0000 [debug]: #0 [flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output] Sending 13; logs records with source category '', source host '', source name 'my-sumo-source', chunk #5fd6322428e95cb3b7fbd07ad24428b0, try 0, batch 0
2023-06-05 14:59:26 +0000 [debug]: #0 [flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output] Created new chunk chunk_id="5fd632622621c3c2585f88717d2c406e" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1685976600, tag="mynamespace.unknown.test-logger.container-name", variables=nil, seq=0>

With log_format: text I no longer see fluentd logging sends and I no longer see logs in Sumologic.

$ tail -f -n 100000 /fluentd/log/out | grep "mynamespace:my-sumo"
2023-06-05 14:41:11 +0000 [debug]: #0 [flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output] Created new chunk chunk_id="5fd62e4de0a29e3c6c8d40ce47a99521" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1685976000, tag="mynamespace.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-05 14:42:16 +0000 [debug]: #0 [flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output] Created new chunk chunk_id="5fd62e8bdd2eb0db408da82be8213a70" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1685976000, tag="mynamespace.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-05 14:43:21 +0000 [debug]: #0 [flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output] Created new chunk chunk_id="5fd62ec9da55d6d9601586c71c835446" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1685976000, tag="mynamespace.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-05 14:50:00 +0000 [debug]: #0 [flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output] Created new chunk chunk_id="5fd63046447c860381363c4d438ee78d" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1685976600, tag="mynamespace.unknown.container-namer-6fd6b8b55c-gmdsf.container-namer", variables=nil, seq=0>
2023-06-05 14:50:51 +0000 [debug]: #0 [flow:mynamespace:my-sumo-flow:output:mynamespace:my-sumo-output] Created new chunk chunk_id="5fd6307703ab3dd92bbfb4c8b3a7dfc4" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1685976600, tag="mynamespace.unknown.test-logger.container-name", variables=nil, seq=0>

The only change happening in the configuration is the value of log_format.

I do see buffers under /buffers.

The logging operator deploys three fluentd containers, I am monitoring and inspecting all of them at once when troubleshooting.

I am using the fluentd debug container and everything seems to be functioning as expected except this one puzzling issue. Any advice is very much appreciated.

@danhale-git
Copy link
Author

danhale-git commented Jun 6, 2023

With log level set to trace:

log_format: json

2023-06-07 06:05:01 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:01 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:06 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:06 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:07 +0000 [trace]: #0 [... ...] writing events into buffer instance=1960 metadata_size=1
2023-06-07 06:05:07 +0000 [trace]: #0 [... ...] chunk /buffers/... ....b5fd83e76014f1e19b59dfa9d48cb2ffc.buffer size_added: 3137 new_size: 21959
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] enqueueing chunk instance=1960 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1686117600, tag="emeacompliancemanagement.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] dequeueing a chunk instance=1960
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] chunk dequeued instance=1960 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1686117600, tag="emeacompliancemanagement.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] trying flush for a chunk chunk="5fd83e76014f1e19b59dfa9d48cb2ffc"
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] adding write count instance=3180
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] executing sync write chunk="5fd83e76014f1e19b59dfa9d48cb2ffc"
2023-06-07 06:05:12 +0000 [debug]: #0 [... ...] Sending 7; logs records with source category '', source host '', source name 'my-source-category, chunk #5fd83e76014f1e19b59dfa9d48cb2ffc, try 0, batch 0
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] write operation done, committing chunk="5fd83e76014f1e19b59dfa9d48cb2ffc"
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] committing write operation to a chunk chunk="5fd83e76014f1e19b59dfa9d48cb2ffc" delayed=false
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] purging a chunk instance=1960 chunk_id="5fd83e76014f1e19b59dfa9d48cb2ffc" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1686117600, tag="emeacompliancemanagement.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] chunk purged instance=1960 chunk_id="5fd83e76014f1e19b59dfa9d48cb2ffc" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1686117600, tag="emeacompliancemanagement.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] done to commit a chunk chunk="5fd83e76014f1e19b59dfa9d48cb2ffc"
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] writing events into buffer instance=1960 metadata_size=1
2023-06-07 06:05:12 +0000 [debug]: #0 [... ...] Created new chunk chunk_id="5fd83eb3fe6c1581711a208220c1df91" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1686117600, tag="emeacompliancemanagement.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-07 06:05:12 +0000 [trace]: #0 [... ...] chunk /buffers/... ....b5fd83eb3fe6c1581711a208220c1df91.buffer size_added: 3137 new_size: 3137
2023-06-07 06:05:17 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:17 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:22 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:22 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:28 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 06:05:28 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960

log_format: text

2023-06-07 05:49:09 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:14 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:14 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:20 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:20 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:25 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:25 +0000 [trace]: #0 [... ...] enqueueing chunk instance=1960 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1686116400, tag="emeacompliancemanagement.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-07 05:49:25 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] dequeueing a chunk instance=1960
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] chunk dequeued instance=1960 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1686116400, tag="emeacompliancemanagement.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] trying flush for a chunk chunk="5fd83aefd47052657cf608a961bbfcd8"
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] adding write count instance=4920
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] executing sync write chunk="5fd83aefd47052657cf608a961bbfcd8"
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] write operation done, committing chunk="5fd83aefd47052657cf608a961bbfcd8"
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] committing write operation to a chunk chunk="5fd83aefd47052657cf608a961bbfcd8" delayed=false
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] purging a chunk instance=1960 chunk_id="5fd83aefd47052657cf608a961bbfcd8" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1686116400, tag="emeacompliancemanagement.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] chunk purged instance=1960 chunk_id="5fd83aefd47052657cf608a961bbfcd8" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1686116400, tag="emeacompliancemanagement.unknown.test-logger.container-name", variables=nil, seq=0>
2023-06-07 05:49:26 +0000 [trace]: #0 [... ...] done to commit a chunk chunk="5fd83aefd47052657cf608a961bbfcd8"
2023-06-07 05:49:30 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:30 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:36 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:36 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:41 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:41 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960
2023-06-07 05:49:47 +0000 [trace]: #0 [... ...] enqueueing all chunks in buffer instance=1960

@sumo-drosiek
Copy link
Contributor

sumo-drosiek commented Jul 12, 2023

@danhale-git Thank you for the issue. I prepared PR which addresses non existing log key. I decided to add warning message as raising exception may lead to re-ingest logs in current architecture

#86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants