multiline: introduce static cri parser backend, providing 9mb/s performance gain over current regex cri parser #9418

ryanohnemus · 2024-09-24T16:58:13Z

As part of #9399, I have been looking into performance improvements in fluent-bit. Specifically for apps in k8s that are creating many logs within a short amount of time (ie logging >15mb/s).

Test setup:
Hardware: GCP n2-standard-8 w/ ssd bootdisk
With config: (see full config in example config section)

  inputs:
    - name: tail
      path: /app/cache/containers/*.log
      tag: kube.*
      skip_long_lines: on
      refresh_interval: 1
      buffer_chunk_size: 250K
      buffer_max_size: 250K
      threaded: on
      rotate_wait: 300
      storage.type: filesystem
  outputs:
    - name: "null"
      match: "*"
      workers: 3

with no multiline.parser: cri (or any parser present) - i am able to achieve - 49.224Mb/s throughput
adding a standard cri parser - regex (current) version - fluent-bit 3.1.8

multiline.parser: cri added to tail input above - throughput is 18.571Mb/s

using the cri parser from this PR -

multiline.parser: criadded to tail input above - throughput is 27.572Mb/s ~9mb/s gain by not using a regex parser

Improving the multiline parsing speed should also considerably cut down on issues from log tail rotations (log rotations missed when fluentbit is backed up), since parsing is much more performant.

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

[X ] Example configuration file for the change
Using the k8s_perf_test from Performance Testing of Fluent-bit with several filters shows log processing falling < 5mb/s #9399 with the following service config:

service:
  flush: 0.5
  log_level: info
  daemon: off
  parsers_file: parsers.conf
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2030
  storage.metrics: on
  storage.path: /app/cache/flb_storage
  storage.backlog.mem_limit: 512M
  storage.delete_irrecoverable_chunks: on
  storage.total_limit_size: 25G
pipeline:
  inputs:
    - name: tail
      path: /app/cache/containers/*.log
      tag: kube.*
      skip_long_lines: on
      refresh_interval: 1
      buffer_chunk_size: 250K
      buffer_max_size: 250K
      threaded: on
      multiline.parser: cri
      rotate_wait: 300
      storage.type: filesystem
  outputs:
    - name: "null"
      match: "*"
      workers: 3

Debug log output from testing the change

Attached Valgrind output that shows no leaks or memory corruption was found

valgrind --leak-check=full ./bin/flb-it-multiline
...

SUCCESS: All unit tests have passed.
==45752== 
==45752== HEAP SUMMARY:
==45752==     in use at exit: 0 bytes in 0 blocks
==45752==   total heap usage: 22,203 allocs, 22,203 frees, 7,605,275 bytes allocated
==45752== 
==45752== All heap blocks were freed -- no leaks are possible
==45752== 
==45752== Use --track-origins=yes to see where uninitialised values come from
==45752== For lists of detected and suppressed errors, rerun with: -s
==45752== ERROR SUMMARY: 32 errors from 2 contexts (suppressed: 0 from 0)

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

[N/A] Run local packaging test showing all targets (including any new ones) build.
[N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

[N/A] Documentation required for this feature
no new documentation, parser is meant to be backward compatible

Signed-off-by: ryanohnemus <[email protected]>

multiline: cri parser backend, do not use regex

7ab84ab

Signed-off-by: ryanohnemus <[email protected]>

ryanohnemus requested review from edsiper, leonardo-albertovich, fujimotos and koleini as code owners September 24, 2024 16:58

github-actions bot added the docs-required label Sep 24, 2024

ryanohnemus temporarily deployed to pr September 24, 2024 16:58 — with GitHub Actions Inactive

ryanohnemus mentioned this pull request Sep 24, 2024

Performance Testing of Fluent-bit with several filters shows log processing falling < 5mb/s #9399

Open

ryanohnemus temporarily deployed to pr September 24, 2024 17:20 — with GitHub Actions Inactive

ryanohnemus temporarily deployed to pr September 24, 2024 17:21 — with GitHub Actions Inactive

ryanohnemus temporarily deployed to pr September 24, 2024 21:41 — with GitHub Actions Inactive

ryanohnemus temporarily deployed to pr September 24, 2024 22:04 — with GitHub Actions Inactive

ryanohnemus force-pushed the cri_parser_backend branch from e14075d to 24261f4 Compare September 25, 2024 12:21

ryanohnemus temporarily deployed to pr September 25, 2024 12:23 — with GitHub Actions Inactive

ryanohnemus temporarily deployed to pr September 25, 2024 12:44 — with GitHub Actions Inactive

multiline: cri parser, ensure minimum line length after finding time

d953463

Signed-off-by: ryanohnemus <[email protected]>

ryanohnemus force-pushed the cri_parser_backend branch from 24261f4 to d953463 Compare September 25, 2024 15:55

ryanohnemus temporarily deployed to pr September 25, 2024 15:55 — with GitHub Actions Inactive

ryanohnemus temporarily deployed to pr September 25, 2024 16:18 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multiline: introduce static cri parser backend, providing 9mb/s performance gain over current regex cri parser #9418

multiline: introduce static cri parser backend, providing 9mb/s performance gain over current regex cri parser #9418

ryanohnemus commented Sep 24, 2024

multiline: introduce static cri parser backend, providing 9mb/s performance gain over current regex cri parser #9418

Are you sure you want to change the base?

multiline: introduce static cri parser backend, providing 9mb/s performance gain over current regex cri parser #9418

Conversation

ryanohnemus commented Sep 24, 2024