Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Multiline Filter Definitions Not Supported #5235

Open
PettitWesley opened this issue Apr 4, 2022 · 10 comments
Open

Multiple Multiline Filter Definitions Not Supported #5235

PettitWesley opened this issue Apr 4, 2022 · 10 comments
Labels
community-feedback long-term Long term issues (exempted by stale bots) waiting-for-user Waiting for more information, tests or requested changes

Comments

@PettitWesley
Copy link
Contributor

PettitWesley commented Apr 4, 2022

Bug Report

If you put two multiline filter definitions in your conf and they both match the same logs this leads to a problem:

[FILTER]
     name                  multiline
     match                 *
     multiline.key_content log
     multiline.parser	   go

[FILTER]
     name                  multiline
     match                 *
     multiline.key_content log
     multiline.parser      multiline-regex-test

This leads to error messages like:

[2022/02/09 07:28:03] [error] [multiline] expected MAP type in first line state buffer

And can cause high memory usage and even cause Fluent Bit to crash.

Why does this happen?

This is because the multiline filter using an emitter input instance to re-emit completed records at the start of the Fluent Bit log pipeline.

In the multiline design #4309 I tried to prevent cycles by having the filter recognize its own in_emitter instance and not try to parse records from its own emitter. Unfortunately, this only solves the problem for a single filter instance. Two filters can lead to a cycle.

So let's go through an example to see why this happens:

  1. Some input ingests chunk A, which contains a multiline
  2. ML_FILTER_1 gets chunk A, and concatenates the records and emits them as Chunk B with the in_emitter
  3. ML_FILTER_1 recognizes that Chunk B came from its own emitter, so passes it on unmodified.
  4. ML_FITLER_2 gets Chunk B, processes the records (assume they match at least one parser) and then emits them as Chunk C with its in_emitter
  5. ML_FILTER_1 gets chunk B, and concatenates the records (these records already passed through that filter and thus match) and emits them as Chunk C with the in_emitter. At this point, we are now at step 2 again, and we have reached an infinite loop.

Workaround 1: If you need only one parser applied to each log statement

The workaround is to only have a single filter definition but remember that you can use multiple parsers in a single definition. The Fluent Bit multiline filtr can only apply a single multiline parser to each log record; it will try each parser in the comma delimited list in order, and apply the first one that matches the log (i.e. use the first parser which has a start_state that matches the log).

This limitation means that each log record can only have 2 multiline parsers successfully applied to it. The first appliedparser can be defined with the tail multiline settings, and the second applied parser can be specified in a multiline filter definition.

So my example from above can become a single filter definition like so:

[FILTER]
     name                  multiline
     match                 *
     multiline.key_content log
     multiline.parser	   go, multiline-regex-test

Workaround 2: If logs can be differentiated by log tag

Another option is if you can have the Match pattern for each filter match different tags.

@PettitWesley PettitWesley changed the title Multiline Multiline Filter Definitions Not Supported Multiple Multiline Filter Definitions Not Supported Apr 4, 2022
@PettitWesley
Copy link
Contributor Author

@edsiper I am not sure of the best way to fix this. I have two questions:

  1. Is there ever a valid use case for having multiple multiline filter definitions? Or would you always want to do what I show above and use all parsers in one filter definition?
  2. Should we just solve this by adding an option to change the tag when its re-emitter (more like rewrite_tag) so that there is no cycle because the concatenated records will have a different tag? I think users won't be satisfied with this because they will want their multilines to have the same tag as other non-multiline records from the same app.

@edsiper
Copy link
Member

edsiper commented Apr 5, 2022

I can think about receiving docker logs through Forward, but you don't know if the format is Docker or CRI-o. besides that not sure what users could try to accomplish.

Note that multiline core already supports multiple formats so we can avoid the user configuring multilple independent filters.

Maybe we need the users to elaborate more on their use cases in this ticket

@PettitWesley
Copy link
Contributor Author

@edsiper Cool, I will direct users to this issue to post about their needs and we can determine if we need to support multiple filters. For the time being, I will assume its not urgent and will submit a doc PR to clearly note this in the docs.

@lecaros lecaros added waiting-for-user Waiting for more information, tests or requested changes community-feedback and removed status: waiting-for-triage labels Apr 11, 2022
@alexku7
Copy link

alexku7 commented May 30, 2022

Hello @PettitWesley

Sometimes for simplicity it's very convenient to declare multiple multiline filters instead of playing with tagging the containers in the tail input. Although ,I agree that it could be logically incorrect solution.

Anyway, I would like to ask . Does it provide the same functionality when I specify multiple parsers in the same filter like you mentioned in the workaround? I mean
Does it the same and identical and will give us the same functionality and the same effect:

[FILTER]
name multiline
match *
multiline.key_content log
multiline.parser go, multiline-regex-test

vs two separated multiline filters?

@drbugfinder-work
Copy link
Contributor

Maybe related to #5524 (comment)

@PettitWesley
Copy link
Contributor Author

@alexku7 Yea, specifying multiple parsers should have the same effect as you'd want from multiple filter definitions... if it doesnt then please let us know.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Sep 26, 2022
@PettitWesley PettitWesley added long-term Long term issues (exempted by stale bots) and removed Stale labels Sep 26, 2022
@gvozdetsky
Copy link

gvozdetsky commented Jun 13, 2024

Just to clarify, but multiple multiline parsers in [INPUT] section are allowed? like multiline.parser docker, cri?

How to understand The two options separated by a comma means multi-format: try docker and cri multiline formats.

@gvozdetsky
Copy link

Ah, I misunderstood first! Add a pull request to correct documentation: fluent/fluent-bit-docs#1392

@vorezal
Copy link

vorezal commented Aug 20, 2024

Would a use case be potentially handling both a partial_message case as well as a language specific multi-line parser case? #4309 mentions two filter instances being required to handle both cases, but this issue seems to indicate doing so would cause an infinite loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-feedback long-term Long term issues (exempted by stale bots) waiting-for-user Waiting for more information, tests or requested changes
Projects
None yet
Development

No branches or pull requests

7 participants