-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate through calls more selectively to avoid false positives #292
Propagate through calls more selectively to avoid false positives #292
Conversation
Specifically, avoid propagating in the absence of knowledge about a function's behavior. As a mitigation, summaries for a large number of standard library functions are added and used.
766799d
to
485f3c2
Compare
485f3c2
to
43e007c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good. I'm very glad to hear that restricting the propagation set leaves us with no false-positives in Kubernetes.
This is a significant departure from our existing model. While it might seem like I'm needling with some of my comments, it's really more that I'm trying to make sure we're placing our feet carefully. That doesn't always carry through in text, so I thought it bore explicitly stating. Overall, I'm very happy with this PR.
I've read the core code changes, but not the tests yet. I want to take a slow and careful look and the change in our testing contract. Some of the comments I make in the code are marked with [Testing]
to reflect that the answer to those concerns either are already or can be resolved by adequate testing, and I will flesh out those comments early next week. Most of them are more TODO reminders for myself mid-review than anything else.
With the introduction of the two new files, the propagation
package is at roughly 1,300 lines. That's a bit up there for my taste. I do note that summaries.go
methods are only consumed by stdlib.go
, and that only a few stdlib.go
methods are only consumed externally by propagate.go
. These feel like a good sub-package to me.
Hmmm, then I noticed the cycle of stdlib.go::taintFromSummary -> taint.referrers
... If there's a clean lift, it still might be preferable to disentangle some of this rather than let propagation
grow into a monolith. But if not, then it's fine as is, especially given that half of those lines are the big summaries file.
I'm surprised that the "update documentation" is (N/A) for this PR. At a glance, though, I see that we don't explicitly document how taint propagates, instead hand-waving away "taint propagates automatically." We really should make explicit the promises we are, and specifically aren't, making. Then again, we should've been explicit in those promises to date and we haven't been, so I don't know that that's really an issue with this PR as much as we should get to that issue in our backlog.
I'll get back to this hopefully-Monday-no-later-than-Tuesday.
Gods smile upon you for the grueling task of those summaries.
internal/pkg/levee/testdata/src/levee_analysistest/example/tests/callorder/colocation.go
Show resolved
Hide resolved
internal/pkg/levee/testdata/src/levee_analysistest/example/tests/propagation/functions.go
Show resolved
Hide resolved
internal/pkg/levee/testdata/src/levee_analysistest/example/tests/stdlib/funcs.go
Show resolved
Hide resolved
internal/pkg/levee/testdata/src/levee_analysistest/example/tests/stdlib/funcs.go
Show resolved
Hide resolved
Thank you for the thorough review! Here are my replies to your high-level thoughts:
|
- Add comments - Change UnqualifiedName back to a types.Var param instead of types.Type - Use named variables instead of explicit bits for ifTainted
- Remove trailing " {" after function signatures in comments - Remove leftover TODO - Return early when a function has a static summary to show intent that propagation doesn't occur twice
Also in this change: rewrote `UnqualifiedName` to use existing functionality from the `types` package.
I ended up breaking off a "summary" package which defines the summary data type and provides a way for the propagation logic to get a hold of an appropriate summary for a given ssa call instruction. I think the code is significantly easier to understand with this separation. I've made the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the refactor. Splitting out sumamries/api
feels good, and I'd hacked something less elegant than For
into some local testing. Looks good.
internal/pkg/propagation/summary/testdata/src/summary_analysistest/tests/test.go
Show resolved
Hide resolved
- Reword comments in stdlib.go - Make want comments in summary tests match exact strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Just so it doesn't get loss in the now very long Conversation
, I had one more non-blocking comment here.
I've hacked together a before-and-after sort of comparison on PurelyApplied:propagation-comparisons
to get a back-of-the-envelope for the sort of findings this change would have us no longer report. Spot-checking against kubernetes, it does appear to be predominantly the value, err := f(source)
pattern that we would now be missing, but I'm okay with that given how much this improves our signal-to-noise. We should consider explicitly drawing attention to "you should make sure you trust your errors" when we update documentation surrounding the scope of taint propagation.
Specifically, avoid propagating in the absence of knowledge about a function's behavior. As a mitigation, summaries for a large number of standard library functions are added and used, which should give reasonable coverage in the intraprocedural case.
Context
The impetus for this PR came from issues involving our assumption that, when the function we are analyzing calls another function with a tainted argument, we should assume that the other arguments and the return value are all tainted. This assumption is unworkable in practice, as it leads to frequent false positives, essentially turning the analyzer into a "was a function called with a tainted value" detector. In such cases, the burden of analysis is shifted onto a human, who has to analyze the entire call tree below the "offending" call.
In future work, we will invest in developing an actual interprocedural analysis, probably based on some kind of pointer analysis.
With the changes in this PR, there are no longer any false positives on
kubernetes/kubernetes
.Process
I tried to gather standard library functions that I think could reasonably be involved in a propagation. I did not look at all functions. More summaries can easily be added later if desired. Also,
Packages I covered:
Tips for reviewers
This PR is meant as a single atomic operation that changes the way we propagate through calls. As such, it contains the removal of some tests which are no longer relevant/appropriate. It also contains the introduction of tests for new code.
The core changes are made in
propagation/propagation.go
and the newpropagation/stdlib.go
file. The bulk of the lines added come frompropagation/summaries.go
, which contains the function summaries (these are just data, not code).