Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loki.secretfilter: utilize fuzz testing for complex input #2630

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

kelnage
Copy link

@kelnage kelnage commented Feb 6, 2025

PR Description

Utilizing go's built-in fuzzing framework, test three major parts of the component that accept complex data (the log line, the component config, and the Gitleaks config), to help provide evidence that they operate correctly in many different configurations.

Which issue(s) this PR fixes

Notes to the Reviewer

The fuzzing identified three issues with the component that have been fixed as part of these commits. I have included the testdata the fuzzing framework generated, but I was wondering whether or not you felt they were useful to include or if I should simply build unit tests to validate their findings instead.

PR Checklist

  • CHANGELOG.md updated
  • Tests updated

Using three different configurations, fuzz test the processEntry
method, seeded with varied log formats.
Rather than fuzzing select configs individually, combine into a single
fuzz test that generates inputs for most of the configurations and tests
them all together.
Fuzz test the component config and executing it against some sample
log lines. The goal here is not to fuzz test Alloy's config parsing, nor
go's regex parsing, but focusing on ensuring a valid config doesn't
cause the component to crash when it is being used.
Test how the component handles the variety of Gitleaks configs that can
be provided to the component. Note this does not test the toml parser
itself, which is out of scope for this testing.
When a Gitleaks toml file contains either an empty regex or a regex that
matches the empty string, the component would out-of-memory when
attempting to redact. This commit excludes both of these cases.
If a rule regex could match an empty string, replacing it could quickly
lead to significant memory usage. Since an empty string cannot
meaningfully be a secret, add a check to validate this and skip it.

Found via go fuzzing of the Gitleaks config, sample included.
@kelnage kelnage added the enhancement New feature or request label Feb 6, 2025
@kelnage kelnage requested a review from a team February 6, 2025 09:25
@kelnage kelnage self-assigned this Feb 6, 2025
@kelnage kelnage requested a review from a team as a code owner February 6, 2025 09:25
Copy link
Member

@mostafa mostafa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Well done! 👏

@@ -0,0 +1,4 @@
go test fuzz v1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if these are needed, and would be happy to hear other reviewers' opinions. I guess including them as test cases are enough.

Comment on lines +707 to +708
// ignore parsing errors, as we aren't fuzz testing the Alloy config parser
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we're not fuzz testing the config parser, wouldn't we still want to know if this fails? Presumably it should succeed so that we can test other code? And if it fails silently, it could mean the fuzz tests didn't really run?

I guess that testConfigs contains some invalid configurations and we want to ignore them, but wouldn't it be cleaner to only have valid configs as input and to not have config as an input to f.Fuzz?

}

entry := loki.Entry{Labels: model.LabelSet{}, Entry: logproto.Entry{Timestamp: time.Now(), Line: log}}
c.processEntry(entry)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not also check the output on ch1?

Comment on lines +2 to +3
string("\n\t\tforward_to = []\n\t\tredact_with = \"<ALLOY-REDACTED-SECR<ALLOY-ET:$SECRET_NAME>\"\n\t")
string("\n\t\ttitle = \"gitleaks custom config\"\n\n\t\t[[rules]]\n\t\tid = \"my-fake-secret\"\n\t\tdescription = \"Identified a fake secret\"\n\t\tregex = '''(?i)\\b(fakeSecret\\d\", \"faiption ='|\\\"|\\n|\\r||;]|$)'''\t\n\t\t[allowlist]\n\t\tregexes = [\"abc\\\\d{3}\", \"fakeSecret[9]{5}\"]\n\t")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm new to fuzz testing - what sorts of variations of these configs is Go actually going to come up with? E.g. if it just puts in some random malformed string as a config, how do we know we should reject it? And even for secrets, how can we make sure in our test that we filter the secrets we want to filter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants