Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate through calls more selectively to avoid false positives #292

Conversation

mlevesquedion
Copy link
Contributor

@mlevesquedion mlevesquedion commented Mar 16, 2021

Specifically, avoid propagating in the absence of knowledge about a function's behavior. As a mitigation, summaries for a large number of standard library functions are added and used, which should give reasonable coverage in the intraprocedural case.

Context

The impetus for this PR came from issues involving our assumption that, when the function we are analyzing calls another function with a tainted argument, we should assume that the other arguments and the return value are all tainted. This assumption is unworkable in practice, as it leads to frequent false positives, essentially turning the analyzer into a "was a function called with a tainted value" detector. In such cases, the burden of analysis is shifted onto a human, who has to analyze the entire call tree below the "offending" call.

In future work, we will invest in developing an actual interprocedural analysis, probably based on some kind of pointer analysis.

With the changes in this PR, there are no longer any false positives on kubernetes/kubernetes.

Process

I tried to gather standard library functions that I think could reasonably be involved in a propagation. I did not look at all functions. More summaries can easily be added later if desired. Also,

Packages I covered:

fmt
errors
strings
bytes
io
io/ioutil
bufio
context
strconv
encoding/json
encoding/base64
sync
text
log
path
path/filepath

Tips for reviewers

This PR is meant as a single atomic operation that changes the way we propagate through calls. As such, it contains the removal of some tests which are no longer relevant/appropriate. It also contains the introduction of tests for new code.

The core changes are made in propagation/propagation.go and the new propagation/stdlib.go file. The bulk of the lines added come from propagation/summaries.go, which contains the function summaries (these are just data, not code).

  • Running against a large codebase such as Kubernetes does not error out. (See DEVELOPING.md for instructions on how to do that.)
  • (N/A) [ ] Appropriate changes to README are included in PR

Specifically, avoid propagating in the absence of knowledge about a
function's behavior. As a mitigation, summaries for a large number
of standard library functions are added and used.
@mlevesquedion mlevesquedion force-pushed the propagate-through-calls-more-selectively branch 3 times, most recently from 766799d to 485f3c2 Compare March 16, 2021 14:30
@mlevesquedion mlevesquedion force-pushed the propagate-through-calls-more-selectively branch from 485f3c2 to 43e007c Compare March 16, 2021 14:33
Copy link
Collaborator

@PurelyApplied PurelyApplied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good. I'm very glad to hear that restricting the propagation set leaves us with no false-positives in Kubernetes.

This is a significant departure from our existing model. While it might seem like I'm needling with some of my comments, it's really more that I'm trying to make sure we're placing our feet carefully. That doesn't always carry through in text, so I thought it bore explicitly stating. Overall, I'm very happy with this PR.

I've read the core code changes, but not the tests yet. I want to take a slow and careful look and the change in our testing contract. Some of the comments I make in the code are marked with [Testing] to reflect that the answer to those concerns either are already or can be resolved by adequate testing, and I will flesh out those comments early next week. Most of them are more TODO reminders for myself mid-review than anything else.


With the introduction of the two new files, the propagation package is at roughly 1,300 lines. That's a bit up there for my taste. I do note that summaries.go methods are only consumed by stdlib.go, and that only a few stdlib.go methods are only consumed externally by propagate.go. These feel like a good sub-package to me.

Hmmm, then I noticed the cycle of stdlib.go::taintFromSummary -> taint.referrers... If there's a clean lift, it still might be preferable to disentangle some of this rather than let propagation grow into a monolith. But if not, then it's fine as is, especially given that half of those lines are the big summaries file.


I'm surprised that the "update documentation" is (N/A) for this PR. At a glance, though, I see that we don't explicitly document how taint propagates, instead hand-waving away "taint propagates automatically." We really should make explicit the promises we are, and specifically aren't, making. Then again, we should've been explicit in those promises to date and we haven't been, so I don't know that that's really an issue with this PR as much as we should get to that issue in our backlog.


I'll get back to this hopefully-Monday-no-later-than-Tuesday.

Gods smile upon you for the grueling task of those summaries.

internal/pkg/utils/utils.go Outdated Show resolved Hide resolved
internal/pkg/propagation/stdlib.go Outdated Show resolved Hide resolved
internal/pkg/propagation/stdlib.go Outdated Show resolved Hide resolved
internal/pkg/propagation/propagation.go Outdated Show resolved Hide resolved
internal/pkg/propagation/summaries.go Outdated Show resolved Hide resolved
internal/pkg/propagation/summaries.go Outdated Show resolved Hide resolved
internal/pkg/propagation/summaries.go Outdated Show resolved Hide resolved
internal/pkg/propagation/summaries.go Outdated Show resolved Hide resolved
internal/pkg/propagation/summaries.go Outdated Show resolved Hide resolved
internal/pkg/propagation/summaries.go Outdated Show resolved Hide resolved
@mlevesquedion
Copy link
Contributor Author

Thank you for the thorough review!

Here are my replies to your high-level thoughts:

  • propagation package too big? I think it's okay not to split for now.
    • As you mention, most of the lines are from summaries.
    • Logic/purpose of code in stdlib.go feels to me like it's very close to the core propagation logic
    • I may change my mind on this after I've added tests for some of the functions in stdlib.go.
  • updated documentation (N/A)
    • Agree that we should set expectations in the docs.
    • I think this can go in an immediately subsequent PR. High level, I'm thinking: analysis is currently intraprocedural, although the analyzer handles some standard library functions, e.g. fmt.Sprintf and any implementation of io.Write. Interprocedural propagation is in the works and expected to land sometime in 2021.

Michaël Lévesque-Dion added 4 commits March 29, 2021 18:14
- Add comments
- Change UnqualifiedName back to a types.Var param instead of
types.Type
- Use named variables instead of explicit bits for ifTainted
- Remove trailing " {" after function signatures in comments
- Remove leftover TODO
- Return early when a function has a static summary to show intent that
propagation doesn't occur twice
Also in this change: rewrote `UnqualifiedName` to use
existing functionality from the `types` package.
@mlevesquedion
Copy link
Contributor Author

mlevesquedion commented Mar 31, 2021

I ended up breaking off a "summary" package which defines the summary data type and provides a way for the propagation logic to get a hold of an appropriate summary for a given ssa call instruction. I think the code is significantly easier to understand with this separation.

I've made the funcName and funcNameWithoutReceiver functions more specific and added tests for them, as well as the sigTypeString function. These tests allowed me to find issues in util.UnqualifiedName (e.g. map[mypackage.foo]string became foo]string and *mypackage.foo became foo). As it turns out, there's a types.TypeString function which allows the "qualification" of names to be customized so I fixed the bugs by using that.

Copy link
Collaborator

@PurelyApplied PurelyApplied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the refactor. Splitting out sumamries/api feels good, and I'd hacked something less elegant than For into some local testing. Looks good.

- Reword comments in stdlib.go
- Make want comments in summary tests match exact strings
Copy link
Collaborator

@PurelyApplied PurelyApplied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Just so it doesn't get loss in the now very long Conversation, I had one more non-blocking comment here.

I've hacked together a before-and-after sort of comparison on PurelyApplied:propagation-comparisons to get a back-of-the-envelope for the sort of findings this change would have us no longer report. Spot-checking against kubernetes, it does appear to be predominantly the value, err := f(source) pattern that we would now be missing, but I'm okay with that given how much this improves our signal-to-noise. We should consider explicitly drawing attention to "you should make sure you trust your errors" when we update documentation surrounding the scope of taint propagation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants