Propagate through calls more selectively to avoid false positives #292

mlevesquedion · 2021-03-16T14:17:55Z

Specifically, avoid propagating in the absence of knowledge about a function's behavior. As a mitigation, summaries for a large number of standard library functions are added and used, which should give reasonable coverage in the intraprocedural case.

Context

The impetus for this PR came from issues involving our assumption that, when the function we are analyzing calls another function with a tainted argument, we should assume that the other arguments and the return value are all tainted. This assumption is unworkable in practice, as it leads to frequent false positives, essentially turning the analyzer into a "was a function called with a tainted value" detector. In such cases, the burden of analysis is shifted onto a human, who has to analyze the entire call tree below the "offending" call.

In future work, we will invest in developing an actual interprocedural analysis, probably based on some kind of pointer analysis.

With the changes in this PR, there are no longer any false positives on kubernetes/kubernetes.

Process

I tried to gather standard library functions that I think could reasonably be involved in a propagation. I did not look at all functions. More summaries can easily be added later if desired. Also,

Packages I covered:

fmt
errors
strings
bytes
io
io/ioutil
bufio
context
strconv
encoding/json
encoding/base64
sync
text
log
path
path/filepath

Tips for reviewers

This PR is meant as a single atomic operation that changes the way we propagate through calls. As such, it contains the removal of some tests which are no longer relevant/appropriate. It also contains the introduction of tests for new code.

The core changes are made in propagation/propagation.go and the new propagation/stdlib.go file. The bulk of the lines added come from propagation/summaries.go, which contains the function summaries (these are just data, not code).

Running against a large codebase such as Kubernetes does not error out. (See DEVELOPING.md for instructions on how to do that.)
(N/A) [ ] Appropriate changes to README are included in PR

Specifically, avoid propagating in the absence of knowledge about a function's behavior. As a mitigation, summaries for a large number of standard library functions are added and used.

PurelyApplied

This is good. I'm very glad to hear that restricting the propagation set leaves us with no false-positives in Kubernetes.

This is a significant departure from our existing model. While it might seem like I'm needling with some of my comments, it's really more that I'm trying to make sure we're placing our feet carefully. That doesn't always carry through in text, so I thought it bore explicitly stating. Overall, I'm very happy with this PR.

I've read the core code changes, but not the tests yet. I want to take a slow and careful look and the change in our testing contract. Some of the comments I make in the code are marked with [Testing] to reflect that the answer to those concerns either are already or can be resolved by adequate testing, and I will flesh out those comments early next week. Most of them are more TODO reminders for myself mid-review than anything else.

With the introduction of the two new files, the propagation package is at roughly 1,300 lines. That's a bit up there for my taste. I do note that summaries.go methods are only consumed by stdlib.go, and that only a few stdlib.go methods are only consumed externally by propagate.go. These feel like a good sub-package to me.

Hmmm, then I noticed the cycle of stdlib.go::taintFromSummary -> taint.referrers... If there's a clean lift, it still might be preferable to disentangle some of this rather than let propagation grow into a monolith. But if not, then it's fine as is, especially given that half of those lines are the big summaries file.

I'm surprised that the "update documentation" is (N/A) for this PR. At a glance, though, I see that we don't explicitly document how taint propagates, instead hand-waving away "taint propagates automatically." We really should make explicit the promises we are, and specifically aren't, making. Then again, we should've been explicit in those promises to date and we haven't been, so I don't know that that's really an issue with this PR as much as we should get to that issue in our backlog.

I'll get back to this hopefully-Monday-no-later-than-Tuesday.

Gods smile upon you for the grueling task of those summaries.

internal/pkg/utils/utils.go

internal/pkg/propagation/stdlib.go

internal/pkg/propagation/propagation.go

internal/pkg/propagation/summaries.go

internal/pkg/levee/testdata/src/levee_analysistest/example/tests/callorder/colocation.go

internal/pkg/levee/testdata/src/levee_analysistest/example/tests/propagation/functions.go

internal/pkg/levee/testdata/src/levee_analysistest/example/tests/stdlib/funcs.go

internal/pkg/propagation/summaries.go

mlevesquedion · 2021-03-29T22:13:53Z

Thank you for the thorough review!

Here are my replies to your high-level thoughts:

propagation package too big? I think it's okay not to split for now.
- As you mention, most of the lines are from summaries.
- Logic/purpose of code in stdlib.go feels to me like it's very close to the core propagation logic
- I may change my mind on this after I've added tests for some of the functions in stdlib.go.
updated documentation (N/A)
- Agree that we should set expectations in the docs.
- I think this can go in an immediately subsequent PR. High level, I'm thinking: analysis is currently intraprocedural, although the analyzer handles some standard library functions, e.g. fmt.Sprintf and any implementation of io.Write. Interprocedural propagation is in the works and expected to land sometime in 2021.

- Add comments - Change UnqualifiedName back to a types.Var param instead of types.Type - Use named variables instead of explicit bits for ifTainted

- Remove trailing " {" after function signatures in comments - Remove leftover TODO - Return early when a function has a static summary to show intent that propagation doesn't occur twice

Also in this change: rewrote `UnqualifiedName` to use existing functionality from the `types` package.

mlevesquedion · 2021-03-31T14:22:37Z

I ended up breaking off a "summary" package which defines the summary data type and provides a way for the propagation logic to get a hold of an appropriate summary for a given ssa call instruction. I think the code is significantly easier to understand with this separation.

I've made the funcName and funcNameWithoutReceiver functions more specific and added tests for them, as well as the sigTypeString function. These tests allowed me to find issues in util.UnqualifiedName (e.g. map[mypackage.foo]string became foo]string and *mypackage.foo became foo). As it turns out, there's a types.TypeString function which allows the "qualification" of names to be customized so I fixed the bugs by using that.

PurelyApplied

I really like the refactor. Splitting out sumamries/api feels good, and I'd hacked something less elegant than For into some local testing. Looks good.

internal/pkg/propagation/stdlib.go

internal/pkg/propagation/summary/testdata/src/summary_analysistest/tests/test.go

- Reword comments in stdlib.go - Make want comments in summary tests match exact strings

PurelyApplied

Looks good.

Just so it doesn't get loss in the now very long Conversation, I had one more non-blocking comment here.

I've hacked together a before-and-after sort of comparison on PurelyApplied:propagation-comparisons to get a back-of-the-envelope for the sort of findings this change would have us no longer report. Spot-checking against kubernetes, it does appear to be predominantly the value, err := f(source) pattern that we would now be missing, but I'm okay with that given how much this improves our signal-to-noise. We should consider explicitly drawing attention to "you should make sure you trust your errors" when we update documentation surrounding the scope of taint propagation.

Propagate through calls more selectively to avoid false positives

4e2d581

Specifically, avoid propagating in the absence of knowledge about a function's behavior. As a mitigation, summaries for a large number of standard library functions are added and used.

mlevesquedion force-pushed the propagate-through-calls-more-selectively branch 3 times, most recently from 766799d to 485f3c2 Compare March 16, 2021 14:30

Merge branch 'master' into propagate-through-calls-more-selectively

43e007c

mlevesquedion force-pushed the propagate-through-calls-more-selectively branch from 485f3c2 to 43e007c Compare March 16, 2021 14:33

mlevesquedion requested review from vinayakankugoyal and benhxy March 16, 2021 14:35

Clarify documentation for summaries.

0a67b99

PurelyApplied requested changes Mar 26, 2021

View reviewed changes

PurelyApplied requested changes Mar 29, 2021

View reviewed changes

internal/pkg/propagation/summaries.go Outdated Show resolved Hide resolved

Michaël Lévesque-Dion added 4 commits March 29, 2021 18:14

Address review comments.

b652708

- Add comments - Change UnqualifiedName back to a types.Var param instead of types.Type - Use named variables instead of explicit bits for ifTainted

Address review comments.

611ec22

- Remove trailing " {" after function signatures in comments - Remove leftover TODO - Return early when a function has a static summary to show intent that propagation doesn't occur twice

Extract and test "summary" package.

7734144

Also in this change: rewrote `UnqualifiedName` to use existing functionality from the `types` package.

Add missing license header.

445843c

Fix typo.

2b531b7

PurelyApplied requested changes Mar 31, 2021

View reviewed changes

internal/pkg/propagation/stdlib.go Outdated Show resolved Hide resolved

internal/pkg/propagation/summary/testdata/src/summary_analysistest/tests/test.go Show resolved Hide resolved

Address review comments.

fe993a8

- Reword comments in stdlib.go - Make want comments in summary tests match exact strings

PurelyApplied approved these changes Apr 2, 2021

View reviewed changes

This was referenced Apr 6, 2021

Determine how/whether we should explicitly enumerate functions that don't propagate taint #293

Open

Update documentation - promises and goals #93

Open

mlevesquedion merged commit 0808815 into google:master Apr 6, 2021

mlevesquedion deleted the propagate-through-calls-more-selectively branch April 6, 2021 12:45

mlevesquedion mentioned this pull request Apr 7, 2021

Add code coverage job. #296

Closed

mlevesquedion mentioned this pull request Apr 16, 2021

Propagate through go calls more selectively. #305

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagate through calls more selectively to avoid false positives #292

Propagate through calls more selectively to avoid false positives #292

mlevesquedion commented Mar 16, 2021 •

edited

Loading

PurelyApplied left a comment

mlevesquedion commented Mar 29, 2021

mlevesquedion commented Mar 31, 2021 •

edited

Loading

PurelyApplied left a comment

PurelyApplied left a comment

Propagate through calls more selectively to avoid false positives #292

Propagate through calls more selectively to avoid false positives #292

Conversation

mlevesquedion commented Mar 16, 2021 • edited Loading

Context

Process

Tips for reviewers

PurelyApplied left a comment

Choose a reason for hiding this comment

mlevesquedion commented Mar 29, 2021

mlevesquedion commented Mar 31, 2021 • edited Loading

PurelyApplied left a comment

Choose a reason for hiding this comment

PurelyApplied left a comment

Choose a reason for hiding this comment

mlevesquedion commented Mar 16, 2021 •

edited

Loading

mlevesquedion commented Mar 31, 2021 •

edited

Loading