-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(datadog grok): enable the DOTALL mode by default #1022
Conversation
5561067
to
b43bd3a
Compare
b43bd3a
to
b5814ad
Compare
dd341c9
to
b5814ad
Compare
b5814ad
to
d019941
Compare
59c01bc
to
72a217f
Compare
changelog.d/1022.breaking.md
Outdated
@@ -0,0 +1,2 @@ | |||
The `parse_groks` VRL function has the multiline mode enabled by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether it makes sense to introduce an optional parameter for users to control this, keep the existing behaviour or make a breaking change for this function? 🤔 Datadog's grok implementation enables the DOTALL mode by default(see the csv filter for example), which seems like a reasonable behaviour. Users can always use (?s)
and (?-s)
modifiers to control this behaviour.
Also please note that the parse_grok
function does not use the datadog/grok
lib, so in this case they would behave differently unless we change it too..
Any thoughts @jszwedko @bruceg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have as much context on vrl
but my 2cents - agreed that making this the default behavior feels reasonable, and I think it would also make sense to eventually update parse_grok
to have parity. If I'm looking at the vrl docs I would assume they share this base behavior.
src/datadog/grok/parse_grok_rules.rs
Outdated
let mut pattern = String::new(); | ||
// In Oniguruma the (?m) modifier is used to enable the DOTALL mode(dot includes newlines), | ||
// as opposed to the (?s) modifier in other regex flavors. | ||
// \A, \z - parses from the beginning to the end of string, not line(until \n) | ||
pattern.push_str(r"\A"); | ||
pattern.push_str(r"(?m)\A"); // (?m) enables the DOTALL mode by default | ||
pattern.push_str(&context.regex); | ||
pattern.push_str(r"\z"); | ||
|
||
// our regex engine(onig) uses (?m) mode modifier instead of (?s) to make the dot match all characters | ||
pattern = pattern.replace("(?s)", "(?m)").replace("(?-s)", "(?-m)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This could be done all in one go:
let pattern = [
// In Oniguruma the (?m) modifier is used to enable the DOTALL mode(dot includes newlines),
// as opposed to the (?s) modifier in other regex flavors.
// \A, \z - parses from the beginning to the end of string, not line(until \n)
r"(?m)\A", // (?m) enables the DOTALL mode by default
&context.regex.replace("(?s)", "(?m)").replace("(?-s)", "(?-m)"),
r"\z",
].concat();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, updated.
Co-authored-by: Bruce Guenter <[email protected]>
Enables the DOTALL mode(dot includes \n) for the Datadog grok lib by default.