-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicitly handle each node type #178
Explicitly handle each node type #178
Conversation
First off, 🎉. Thank you for taking the time to look at each of the
Just as a point of order, I only see 41 implemented and in the docs - 41 Nodes (30 Values, 35 Instructions, many both). But that's still a heap load.
I suspected this would happen, which is why I was pushing on it. So double-thanks. High level comments: Overall, I like it. Good comments on the various switch case groups. For consistency, I'd probably add a one-liner even to the more obvious ones, like I think this is pushing against our long-standing issue of It might help to conceptualize the directional flow of taint better if we had a layer of separation between
For the expression
We're probably on the same page, but your comment seems focused on the second case. Or is it not a 👍 overall. Glad to see this improvement. |
Thanks for pushing on this, and thanks for your thoughts!
Oops. I was looking at the entire list in
A closer look at some/all of the instructions would be very welcome. This stuff can be tricky, so I think some duplication of effort could be good. For the specific ones that you are mentioning here, here are some of my thoughts:
(Maybe I should add these extra thoughts to the switches? Maybe each instruction should get its own
I think that would be difficult to do. Sometimes we need to propagate in ways that don't really make sense from a code execution point of view, so I'm not sure we would be able to realize this idea of traversing from
Agreed, those are my same thoughts. We should never propagate to anything but the |
Ahh, yes. I thought it was one that was both a
Similarly, I thought
For both |
That would have made sense, that was also my expectation before looking at the docs.
Done. |
…y' into switch-on-every-node-type
I'm not sure how much more work here needs to be done before this graduates from a I don't feel like this PR needs to be help up by decisions about how to handle various instructions. Just getting this switch in place would allow easier iteration for correcting, e.g., how we handle It's up to you, but I'd be happy with an "in-place, no observable behavior change" refactor PR that allows easy iteration on specific instructions in future PRs. |
Yep, I agree! I have the same sentiment and I was about to write a comment expressing it. I plan to review the switch to make sure that cases where things are a bit ambiguous don't lose any propagation for now, and create an issue for each of those to investigate them further. I'll round this out and send it for review as soon as it's ready. |
return | ||
} | ||
} | ||
|
||
mirCopy := map[*ssa.BasicBlock]int{} | ||
for m, i := range maxInstrReached { | ||
mirCopy[m] = i | ||
} | ||
|
||
if instr, ok := n.(ssa.Instruction); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm doing a second if
that has the same head as the previous one because I wanted to detangle two things:
- should we keep traversing?
- update the max instruction in block, if applicable
These should really be replaced by calls to dedicated functions. I'm happy to do it in this PR, but I could also do it in a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and extracted the first if (as well as other related instructions) into its own function. I think this reads much better.
This PR is ready for review. Please read the updated description for a current summary of what this PR represents. |
} | ||
// booleans can't meaningfully be tainted | ||
if isBoolean(n) { | ||
func (s *Source) dfs(n ssa.Node, maxInstrReached map[*ssa.BasicBlock]int, lastBlockVisited *ssa.BasicBlock, isReferrer bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function needed to be modified because the check for a call that is a referrer used to be done in visitReferrers
. Since Call
is the only special case, it feels cleaner to do it here. Other changes fell out of that change, because this function became unwieldy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with how this is shaking out. 🎉
Out of scope for this PR, but it is an overhaul...
source.RefersTo
is unused and should be pruned.
s.sanitizers = append(s.sanitizers, &sanitizer.Sanitizer{Call: v}) | ||
} | ||
// booleans can't meaningfully be tainted | ||
if isBoolean(n) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string
notwithstanding, do any of the basic types meaningfully carry taint? Certainly string
as a convertable type to []byte
or []rune
could carry taint, but does an individual byte
or rune
meaningfully carry taint?
If we're excluding basic types, we should consider excluding all numeric and pointer types.
Conversely, if memory serves, this came from the special handling surrounding val, ok := SomeExtract
. If that's now / going to be being properly handled in the big instruction switch, then we shouldn't need to handle that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit pathological, but if each byte
/rune
from a tainted []byte
or []rune
were to be sinked individually, avoiding bytes
and runes
would miss that. (e.g. something like for _, r := range "secret" { fmt.Printf(r) }
; again, this is kind of pathological). A similar thing could occur with some numeric types, e.g. in theory a cryptographic key could be sinked byte-by-byte.
we should consider excluding ... pointer types.
I think that could cause us to miss a lot of things. We don't know what a sink is going to do with a pointer argument, e.g. dereference it.
Conversely, if memory serves, ...
You are correct, but we also wanted to avoid things like fmt.Println(isSource(Source{}))
, so I think avoiding all booleans is justified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that could cause us to miss a lot of things. We don't know what a sink is going to do with a pointer argument, e.g. dereference it.
I was imprecise. I meant within the context of basic types - specifically, the types.BasicKind
s, which fall into:
- Invalid
- Strings
- Numeric types, including boolean
- Unsafe pointers
- Untyped types
- The
Byte
andRune
alias forUint8
andInt32
respectively.
So when I said "pointers," I should have said "unsafe pointers." I feel like unsafe
, like reflect
, is out of scope for contextual analysis for us.
So of the above, I would consider taint as able to propagate to strings (typed and untyped), and optionally runes (typed and untypted) and bytes.
You could make a similar pathological argument that someone could coerce a string -> []rune -> []int32 -> []int64
and back, but that feels like the sort of thing that is hard to avoid criticism in code review. \shrug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this is another thing that falls into "could easily be its own PR." I was really just stirring conversation while it was front of mind, but we should shift this conversation to in Issue to not block this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Issue here: #191
} | ||
|
||
func (s *Source) visitReferrers(n ssa.Node, maxInstrReached map[*ssa.BasicBlock]int, lastBlockVisited *ssa.BasicBlock) { | ||
referrers := s.referrersToVisit(n, maxInstrReached) | ||
func (s *Source) shouldNotVisit(n ssa.Node, maxInstrReached map[*ssa.BasicBlock]int, lastBlockVisited *ssa.BasicBlock, isReferrer bool) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: Consider writing for the positive rather than the negative, i.e. shouldVisit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I would agree, but in this case the function is only used in the negative case, so if it were named shouldVisit
we would need to write !shouldVisit
in the calling code, which is just as readable as shouldNotVisit
, but feels a bit backwards to me: we're asking if we should visit, but we're actually interested in knowing if we shouldn't visit, so we immediately flip the result of the call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, it's about reading the function while you're in it. While I'm looking at this function, I don't necessarily know all of its use-cases. I also don't know that consumption won't change to invert this into a double negative. At least, that's where I'm coming from, philosophically.
Then again, Naming is Hard:tm: and it's a bikeshed. It's fine as it is.
if recv := t.Call.Signature().Recv(); recv != nil && s.config.IsSourceType(utils.DecomposeType(utils.Dereference(recv.Type()))) { | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this not also exclude, e.g., wrapper.Source.GetSecret()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. This is actually one of the more significant architectural flaws in the codebase right now, IMO.
The reason is that GetSecret
will be identified as a fieldpropagator
, which will be found in levee
when checking for Calls
, which will start a new traversal with the result of GetSecret
as a source. I think this is unintuitive to say the least. I had a PR for using fieldpropagator
in source
a long time ago, #77, but I think I ended up closing it because I wanted the fieldpropagators
to be consumed by sourcetype
or something of the sort.
As a side note, currently the propagation logic in fieldpropagator
is not nearly as developed as that in source
, and in fact I believe that logic should probably be shared: #130.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. And yeah, you know opinion on the pipeline of analyzers here. If these were unified in levee
instead of propagation happening all in source
... but that's a conversation for a future PR.
s.visitOperands(n, maxInstrReached, lastBlockVisited) | ||
|
||
// These nodes are both Instructions and Values, and currently have no special restrictions. | ||
case *ssa.Field, *ssa.MakeInterface, *ssa.Select, *ssa.TypeAssert: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should *ssa.Field
have the same special handling that ssa.FieldAddr
has?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think almost certainly, yes. See #188. TLDR: since we don't have a good suite of tests for Field
, I think for now it would be safer to keep the existing behavior of traversing to everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
s.visitOperands(n, maxInstrReached, lastBlockVisited) | ||
|
||
// These nodes are both Instructions and Values, and currently have no special restrictions. | ||
case *ssa.Field, *ssa.MakeInterface, *ssa.Select, *ssa.TypeAssert: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to an above comment, but I might expect the special handling of not propagating through the CommaOK
value of a type assert to be handled here - at least if we weren't otherwise claiming that a boolean can't be tainted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do that by checking if the TypeAssert
is CommaOk
, and if so, avoiding the 1st Referrer
. I think it's simpler to rely on the code for avoiding booleans, though.
Looking into this, it looks like the CommaOk
case for type asserts was not covered, so I added a line to an existing test:
func TestSourcePointerAssertedFromParameterEfaceCommaOk(e interface{}) {
s, ok := e.(*core.Source)
core.Sink(s) // want "a source has reached a sink"
core.Sink(ok) <-- this is the line I added, no report should be produced here
}
// Only the Map itself can be tainted by an Update. | ||
// The Key can't be tainted. | ||
// The Value can propagate taint to the Map, but not receive it. | ||
// MapUpdate has no referrers, it is only an Instruction, not a Value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly out of scope for this PR, possibly out of scope for this analyzer, but do we need to consider where keys are tainted? E.g.
func TestFoo() {
s := core.Source{Data: "password1234"}
m := map[core.Source]string{s: s.Data}
for src, str := range m {
core.Sink(src) // want "a source has reached a sink"
core.Sink(str) // want "a source has reached a sink"
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. With this case in mind, I think in general we may want to consider anything coming out of a tainted map to be tainted. The current tests don't do that, e.g.
func TestRangeOverMapWithSourceAsKey() {
m := map[core.Source]string{core.Source{Data: "password1234"}: "don't sink me"}
for src, str := range m {
core.Sink(src) // want "a source has reached a sink"
core.Sink(str)
}
}
In the above cases, the fix would be to propagate through the Next
instruction. In the current PR, we do not propagate through those instructions at all; on master, we do. I think for now it would be best to add that case you mentioned and to keep propagating through Next
instructions. WDYT?
(For MapUpdate
though I'm confident we don't want to traverse to anything other than the Map
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm going to use sources in keys and strings in values in this comment. The same logic applies in reverse.)
I think if we propagate through Next
, we might end up tainting keys categorically where we shouldn't. It could be worth a shot and seeing how it plays out though.
But the more I think about it, the more I think this might be out of scope. It feels like we're needing to decide categorically whether or not both keys and values are tainted if a map contains a source type value. Otherwise, we would have to make a distinction between the map "safely" holding its values versus having been tainted by receiving a tainted value as a key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG. I've added it to #188.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, I think all of my concerns are small and modular enough to be their own future PRs. Thanks again for looking into this.
This PR implements explicit handling of each node type. The idea is to make taint propagation a deliberate choice, instead of the current "just traverse to everything, except if X, Y or Z". Indeed, in most cases, propagating everywhere doesn't make much sense.
I think the resulting code is easier to understand. The epic
switch
looks intimidating, but for every node type, where to traverse is specified explicitly, in a single place. Also,visitReferrers
actually means "visit the referrers" now (same forvisitOperands
).Performing this work has made me discover some issues/potentially missing test cases:
BinOp
needs tests, e.g. to make sure we aren't propagating to the other (non-tainted) operand : Add missing BinOp test #179Phi
needs tests, e.g. to make sure we aren't propagating to the other (non-tainted) operand : Add missing Phi test #180MapUpdate
needs tests to make sure we are propagating only to theMap
, not to theKey
or theVal
. Add missing map tests #181The above issues have all been demonstrated via test cases, which are fixed by this PR.
For cases where the correct propagation is ambiguous, I have opened an issue here: #188. These will need further investigation. In the meantime, this PR maintains the current behavior of "traverse everywhere" for these cases.