Explicitly handle each node type #178

mlevesquedion · 2020-11-10T18:27:20Z

This PR implements explicit handling of each node type. The idea is to make taint propagation a deliberate choice, instead of the current "just traverse to everything, except if X, Y or Z". Indeed, in most cases, propagating everywhere doesn't make much sense.

I think the resulting code is easier to understand. The epic switch looks intimidating, but for every node type, where to traverse is specified explicitly, in a single place. Also, visitReferrers actually means "visit the referrers" now (same for visitOperands).

Performing this work has made me discover some issues/potentially missing test cases:

BinOp needs tests, e.g. to make sure we aren't propagating to the other (non-tainted) operand : Add missing BinOp test #179
Phi needs tests, e.g. to make sure we aren't propagating to the other (non-tainted) operand : Add missing Phi test #180
MapUpdate needs tests to make sure we are propagating only to the Map, not to the Key or the Val. Add missing map tests #181
Tests for Index/IndexAddr : Add tests for range to not propagate to non-source types. #187

The above issues have all been demonstrated via test cases, which are fixed by this PR.

For cases where the correct propagation is ambiguous, I have opened an issue here: #188. These will need further investigation. In the meantime, this PR maintains the current behavior of "traverse everywhere" for these cases.

Tests pass
Running against a large codebase such as Kubernetes does not error out.
(N/A) [ ] Appropriate changes to README are included in PR

PurelyApplied · 2020-11-10T21:25:25Z

First off, 🎉. Thank you for taking the time to look at each of the Node types.

out of 42 node types

Just as a point of order, I only see 41 implemented and in the docs - 41 Nodes (30 Values, 35 Instructions, many both). But that's still a heap load.

Performing this work has made me discover some issues/potentially missing test cases:

I suspected this would happen, which is why I was pushing on it. So double-thanks.

High level comments:

Overall, I like it. Good comments on the various switch case groups. For consistency, I'd probably add a one-liner even to the more obvious ones, like Call or FieldAddr. I might want to take a closer look into some of the instructions being skipped, like Builtin, FreeVar and maybe Return, but that might be lower-level than you're looking for right now.

I think this is pushing against our long-standing issue of source analyzer doing the actual propagation logic. This seems pretty well modular, so it probably is orthogonal to that effort.

It might help to conceptualize the directional flow of taint better if we had a layer of separation between Value and Instruction in our propagation. We are (and have long been) traversing from Node to Node, but conceptually, taint progresses from Value to Value along edges defined by Instructions. It might read better if we had start with a Value, got it's referrers(), and determined what neighboring Values taint might flow through using a switch like this. But that also feels like a refactor that is orthogonal to this work and could come in a future PR, if at all.

MapUpdate needs tests to make sure we aren't propagating to the key, and that the key isn't tainting the map.

For the expression m[k] = v, I would expect:

If k is tainted, m becomes tainted. (v remains untainted.)
If v is tainted, m becomes tainted. (k remains untainted.)

We're probably on the same page, but your comment seems focused on the second case. Or is it not a MapUpdate instruction if k is not already in m?

👍 overall. Glad to see this improvement.

mlevesquedion · 2020-11-10T21:44:30Z

Thanks for pushing on this, and thanks for your thoughts!

... 41 nodes ...

Oops. I was looking at the entire list in ssa/doc.go, which includes some non-nodes (actually it looks like I miscounted those as well). There are indeed 41, according to vim-go's :GoImplements.

... look into some of the instructions being skipped, like Builtin, FreeVar and maybe Return, but that might be lower-level than you're looking for right now.

A closer look at some/all of the instructions would be very welcome. This stuff can be tricky, so I think some duplication of effort could be good.

For the specific ones that you are mentioning here, here are some of my thoughts:

Builtin is just a Value, conceptually it is a reference to one of the Builtin functions, so it can't be tainted. If there is a Call to a Builtin, we will propagate the taint via the Call. (I am fairly confident of this, but additional tests would not be unwelcome.)
FreeVars are handled by sourcesFromClosure.
Return can't have Referrers, and it is not possible for a value being returned to taint another value via the Return instruction.

(Maybe I should add these extra thoughts to the switches? Maybe each instruction should get its own case + comment? That might get unwieldy, though.)

It might help to conceptualize...

I think that would be difficult to do. Sometimes we need to propagate in ways that don't really make sense from a code execution point of view, so I'm not sure we would be able to realize this idea of traversing from Value to Value via Instructions.

For the expression m[k] = v, I would expect: ...

Agreed, those are my same thoughts. We should never propagate to anything but the Map.

PurelyApplied · 2020-11-10T21:53:36Z

Builtin is just a Value, conceptually it is a reference to one of the Builtin functions, so it can't be tainted. If there is a Call to a Builtin, we will propagate the taint via the Call. (I am fairly confident of this, but additional tests would not be unwelcome.)

Ahh, yes. I thought it was one that was both a Value and an Instruction, and as an Instruction, I would expect, say, copy to propagate taint. My mistake.

Return can't have Referrers, and it is not possible for a value being returned to taint another value via the Return instruction.

Similarly, I thought Return had a Value, but it looks like it's just the "We're done now" Instruction. The underlying Return.Value should already be tainted if the return value should be tainted.

FreeVars are handled by sourcesFromClosure.

For both FreeVars and Return.Values, it was more a comment about completion than consumption. If we're explicitly relying here on how things are handled elsewhere regarding FreeVars, we might want to at least add that comment in this switch. But yeah, not necessary in the current implementation.

mlevesquedion · 2020-11-10T22:10:46Z

Ahh, yes. I thought it was one that was both a Value and an Instruction, and as an Instruction, I would expect, say, copy to propagate taint. My mistake.

That would have made sense, that was also my expectation before looking at the docs.

If we're explicitly relying here on how things are handled elsewhere regarding FreeVars, we might want to at least add that comment in this switch.

Done.

…y' into switch-on-every-node-type

PurelyApplied · 2020-11-12T21:37:26Z

I'm not sure how much more work here needs to be done before this graduates from a Draft to an "official" PR.

I don't feel like this PR needs to be help up by decisions about how to handle various instructions. Just getting this switch in place would allow easier iteration for correcting, e.g., how we handle BinOp or Range instructions.

It's up to you, but I'd be happy with an "in-place, no observable behavior change" refactor PR that allows easy iteration on specific instructions in future PRs.

mlevesquedion · 2020-11-12T22:02:42Z

Yep, I agree! I have the same sentiment and I was about to write a comment expressing it.

I plan to review the switch to make sure that cases where things are a bit ambiguous don't lose any propagation for now, and create an issue for each of those to investigate them further.

I'll round this out and send it for review as soon as it's ready.

mlevesquedion · 2020-11-12T22:21:33Z

internal/pkg/source/source.go

+			return
+		}
+	}
+
 	mirCopy := map[*ssa.BasicBlock]int{}
 	for m, i := range maxInstrReached {
 		mirCopy[m] = i
 	}

 	if instr, ok := n.(ssa.Instruction); ok {


I'm doing a second if that has the same head as the previous one because I wanted to detangle two things:

should we keep traversing?

update the max instruction in block, if applicable

These should really be replaced by calls to dedicated functions. I'm happy to do it in this PR, but I could also do it in a future PR.

I went ahead and extracted the first if (as well as other related instructions) into its own function. I think this reads much better.

mlevesquedion · 2020-11-12T22:36:26Z

This PR is ready for review. Please read the updated description for a current summary of what this PR represents.

mlevesquedion · 2020-11-12T22:43:15Z

internal/pkg/source/source.go

-	}
-	// booleans can't meaningfully be tainted
-	if isBoolean(n) {
+func (s *Source) dfs(n ssa.Node, maxInstrReached map[*ssa.BasicBlock]int, lastBlockVisited *ssa.BasicBlock, isReferrer bool) {


This function needed to be modified because the check for a call that is a referrer used to be done in visitReferrers. Since Call is the only special case, it feels cleaner to do it here. Other changes fell out of that change, because this function became unwieldy.

PurelyApplied

I'm happy with how this is shaking out. 🎉

Out of scope for this PR, but it is an overhaul...

source.RefersTo is unused and should be pruned.

PurelyApplied · 2020-11-12T23:37:02Z

internal/pkg/source/source.go

-				s.sanitizers = append(s.sanitizers, &sanitizer.Sanitizer{Call: v})
-			}
+	// booleans can't meaningfully be tainted
+	if isBoolean(n) {


string notwithstanding, do any of the basic types meaningfully carry taint? Certainly string as a convertable type to []byte or []rune could carry taint, but does an individual byte or rune meaningfully carry taint?

If we're excluding basic types, we should consider excluding all numeric and pointer types.

Conversely, if memory serves, this came from the special handling surrounding val, ok := SomeExtract. If that's now / going to be being properly handled in the big instruction switch, then we shouldn't need to handle that here.

This is a bit pathological, but if each byte/rune from a tainted []byte or []rune were to be sinked individually, avoiding bytes and runes would miss that. (e.g. something like for _, r := range "secret" { fmt.Printf(r) }; again, this is kind of pathological). A similar thing could occur with some numeric types, e.g. in theory a cryptographic key could be sinked byte-by-byte.

we should consider excluding ... pointer types.

I think that could cause us to miss a lot of things. We don't know what a sink is going to do with a pointer argument, e.g. dereference it.

Conversely, if memory serves, ...

You are correct, but we also wanted to avoid things like fmt.Println(isSource(Source{})), so I think avoiding all booleans is justified.

I think that could cause us to miss a lot of things. We don't know what a sink is going to do with a pointer argument, e.g. dereference it.

I was imprecise. I meant within the context of basic types - specifically, the types.BasicKinds, which fall into:

Invalid

Strings

Numeric types, including boolean

Unsafe pointers

Untyped types

The Byte and Rune alias for Uint8 and Int32 respectively.

So when I said "pointers," I should have said "unsafe pointers." I feel like unsafe, like reflect, is out of scope for contextual analysis for us.

So of the above, I would consider taint as able to propagate to strings (typed and untyped), and optionally runes (typed and untypted) and bytes.

You could make a similar pathological argument that someone could coerce a string -> []rune -> []int32 -> []int64 and back, but that feels like the sort of thing that is hard to avoid criticism in code review. \shrug

Also, this is another thing that falls into "could easily be its own PR." I was really just stirring conversation while it was front of mind, but we should shift this conversation to in Issue to not block this PR.

Agreed. Issue here: #191

PurelyApplied · 2020-11-12T23:37:55Z

internal/pkg/source/source.go

 }

-func (s *Source) visitReferrers(n ssa.Node, maxInstrReached map[*ssa.BasicBlock]int, lastBlockVisited *ssa.BasicBlock) {
-	referrers := s.referrersToVisit(n, maxInstrReached)
+func (s *Source) shouldNotVisit(n ssa.Node, maxInstrReached map[*ssa.BasicBlock]int, lastBlockVisited *ssa.BasicBlock, isReferrer bool) bool {


Style nit: Consider writing for the positive rather than the negative, i.e. shouldVisit.

In general I would agree, but in this case the function is only used in the negative case, so if it were named shouldVisit we would need to write !shouldVisit in the calling code, which is just as readable as shouldNotVisit, but feels a bit backwards to me: we're asking if we should visit, but we're actually interested in knowing if we shouldn't visit, so we immediately flip the result of the call.

For me, it's about reading the function while you're in it. While I'm looking at this function, I don't necessarily know all of its use-cases. I also don't know that consumption won't change to invert this into a double negative. At least, that's where I'm coming from, philosophically.

Then again, Naming is Hard:tm: and it's a bikeshed. It's fine as it is.

PurelyApplied · 2020-11-12T23:51:52Z

internal/pkg/source/source.go

+		if recv := t.Call.Signature().Recv(); recv != nil && s.config.IsSourceType(utils.DecomposeType(utils.Dereference(recv.Type()))) {
+			return
+		}


Why does this not also exclude, e.g., wrapper.Source.GetSecret()?

Good question. This is actually one of the more significant architectural flaws in the codebase right now, IMO.

The reason is that GetSecret will be identified as a fieldpropagator, which will be found in levee when checking for Calls, which will start a new traversal with the result of GetSecret as a source. I think this is unintuitive to say the least. I had a PR for using fieldpropagator in source a long time ago, #77, but I think I ended up closing it because I wanted the fieldpropagators to be consumed by sourcetype or something of the sort.

As a side note, currently the propagation logic in fieldpropagator is not nearly as developed as that in source, and in fact I believe that logic should probably be shared: #130.

Gotcha. And yeah, you know opinion on the pipeline of analyzers here. If these were unified in levee instead of propagation happening all in source... but that's a conversation for a future PR.

PurelyApplied · 2020-11-13T00:00:09Z

internal/pkg/source/source.go

+		s.visitOperands(n, maxInstrReached, lastBlockVisited)
+
+	// These nodes are both Instructions and Values, and currently have no special restrictions.
+	case *ssa.Field, *ssa.MakeInterface, *ssa.Select, *ssa.TypeAssert:


Should *ssa.Field have the same special handling that ssa.FieldAddr has?

I think almost certainly, yes. See #188. TLDR: since we don't have a good suite of tests for Field, I think for now it would be safer to keep the existing behavior of traversing to everything.

Sounds good to me.

PurelyApplied · 2020-11-13T00:05:59Z

internal/pkg/source/source.go

+		s.visitOperands(n, maxInstrReached, lastBlockVisited)
+
+	// These nodes are both Instructions and Values, and currently have no special restrictions.
+	case *ssa.Field, *ssa.MakeInterface, *ssa.Select, *ssa.TypeAssert:


Related to an above comment, but I might expect the special handling of not propagating through the CommaOK value of a type assert to be handled here - at least if we weren't otherwise claiming that a boolean can't be tainted.

We could do that by checking if the TypeAssert is CommaOk, and if so, avoiding the 1st Referrer. I think it's simpler to rely on the code for avoiding booleans, though.

Looking into this, it looks like the CommaOk case for type asserts was not covered, so I added a line to an existing test:

func TestSourcePointerAssertedFromParameterEfaceCommaOk(e interface{}) { s, ok := e.(*core.Source) core.Sink(s) // want "a source has reached a sink" core.Sink(ok) <-- this is the line I added, no report should be produced here }

PurelyApplied · 2020-11-13T00:26:21Z

internal/pkg/source/source.go

+	// Only the Map itself can be tainted by an Update.
+	// The Key can't be tainted.
+	// The Value can propagate taint to the Map, but not receive it.
+	// MapUpdate has no referrers, it is only an Instruction, not a Value.


Certainly out of scope for this PR, possibly out of scope for this analyzer, but do we need to consider where keys are tainted? E.g.

func TestFoo() { s := core.Source{Data: "password1234"} m := map[core.Source]string{s: s.Data} for src, str := range m { core.Sink(src) // want "a source has reached a sink" core.Sink(str) // want "a source has reached a sink" } }

Hmm. With this case in mind, I think in general we may want to consider anything coming out of a tainted map to be tainted. The current tests don't do that, e.g.

func TestRangeOverMapWithSourceAsKey() { m := map[core.Source]string{core.Source{Data: "password1234"}: "don't sink me"} for src, str := range m { core.Sink(src) // want "a source has reached a sink" core.Sink(str) } }

In the above cases, the fix would be to propagate through the Next instruction. In the current PR, we do not propagate through those instructions at all; on master, we do. I think for now it would be best to add that case you mentioned and to keep propagating through Next instructions. WDYT?

(For MapUpdate though I'm confident we don't want to traverse to anything other than the Map).

(I'm going to use sources in keys and strings in values in this comment. The same logic applies in reverse.)

I think if we propagate through Next, we might end up tainting keys categorically where we shouldn't. It could be worth a shot and seeing how it plays out though.

But the more I think about it, the more I think this might be out of scope. It feels like we're needing to decide categorically whether or not both keys and values are tainted if a map contains a source type value. Otherwise, we would have to make a distinction between the map "safely" holding its values versus having been tainted by receiving a tainted value as a key.

SG. I've added it to #188.

PurelyApplied

At this point, I think all of my concerns are small and modular enough to be their own future PRs. Thanks again for looking into this.

explicitly handle each node type

971e05a

mlevesquedion marked this pull request as draft November 10, 2020 18:27

Michaël Lévesque-Dion added 5 commits November 10, 2020 13:27

remove TODO (done)

963d0f0

simplify comment

f2ab7e6

improve naming

b61e801

break out block for clarity

cecaddf

add missing period

96058ff

mlevesquedion requested review from PurelyApplied and vinayakankugoyal November 10, 2020 20:58

reorganize code, extract TODOs

c442eec

mlevesquedion mentioned this pull request Nov 10, 2020

Add missing BinOp test #179

Merged

1 task

mlevesquedion mentioned this pull request Nov 10, 2020

Add missing Phi test #180

Merged

1 task

add explicit comment for freevars

05d03ac

mlevesquedion mentioned this pull request Nov 10, 2020

Add missing map tests #181

Merged

1 task

Michaël Lévesque-Dion added 2 commits November 12, 2020 09:38

Merge branch 'master' into switch-on-every-node-type

b0f8c80

remove fixed TODO

736fd8a

mlevesquedion mentioned this pull request Nov 12, 2020

Fix: Only traverse to reference args #185

Merged

2 tasks

Michaël Lévesque-Dion and others added 5 commits November 12, 2020 11:37

erge branch 'master' into switch-on-every-node-type

31c6aa1

merge phi tests, confirm fix

ee7a9f7

Merge branch 'master' into switch-on-every-node-type

0782747

merge master + remove fixed todos

e73bea1

Add tests for range to not propagate to non-source value types.

0df923c

PurelyApplied mentioned this pull request Nov 12, 2020

Add tests for range to not propagate to non-source types. #187

Merged

Michaël Lévesque-Dion added 3 commits November 12, 2020 15:58

Merge remote-tracking branch 'purelyapplied/range-should-not-taint-ke…

95dea62

…y' into switch-on-every-node-type

merge range propagation cases

0d22ab8

handle range cases

781737d

Michaël Lévesque-Dion added 2 commits November 12, 2020 17:07

move field and freevar back to traverse-everything case

cfb17d7

merge master + fix conflicts

f242206

mlevesquedion commented Nov 12, 2020

View reviewed changes

Michaël Lévesque-Dion added 2 commits November 12, 2020 17:25

add/update comments

98b13d0

freevar is a value; move index + indexaddr to a better spot

da4053b

mlevesquedion mentioned this pull request Nov 12, 2020

Determine correct propagation behavior for ambiguous cases #188

Closed

8 tasks

mlevesquedion marked this pull request as ready for review November 12, 2020 22:35

Michaël Lévesque-Dion added 2 commits November 12, 2020 17:41

extract decision to not visit to own function

7357c93

add to preorder + marked right after deciding to visit

c45faae

mlevesquedion commented Nov 12, 2020

View reviewed changes

PurelyApplied reviewed Nov 13, 2020

View reviewed changes

do not want commaok to report

4ec379b

PurelyApplied approved these changes Nov 13, 2020

View reviewed changes

mlevesquedion mentioned this pull request Nov 13, 2020

Determine which Basic types can't meaningfully be tainted #191

Closed

mlevesquedion merged commit 449f50f into google:master Nov 13, 2020

mlevesquedion deleted the switch-on-every-node-type branch November 13, 2020 22:17

mlevesquedion mentioned this pull request Nov 13, 2020

Cleanup: Remove dead Source.RefersTo method #192

Merged

1 task

mlevesquedion changed the title ~~Explicitly handle each node type~~ Improvement: Explicitly handle each node type Feb 19, 2021

mlevesquedion changed the title ~~Improvement: Explicitly handle each node type~~ Explicitly handle each node type Feb 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicitly handle each node type #178

Explicitly handle each node type #178

mlevesquedion commented Nov 10, 2020 •

edited

Loading

PurelyApplied commented Nov 10, 2020

mlevesquedion commented Nov 10, 2020 •

edited

Loading

PurelyApplied commented Nov 10, 2020

mlevesquedion commented Nov 10, 2020

PurelyApplied commented Nov 12, 2020

mlevesquedion commented Nov 12, 2020 •

edited

Loading

mlevesquedion Nov 12, 2020 •

edited

Loading

mlevesquedion Nov 12, 2020

mlevesquedion commented Nov 12, 2020

mlevesquedion Nov 12, 2020 •

edited

Loading

PurelyApplied left a comment

PurelyApplied Nov 12, 2020

mlevesquedion Nov 13, 2020 •

edited

Loading

PurelyApplied Nov 13, 2020

PurelyApplied Nov 13, 2020

mlevesquedion Nov 13, 2020

PurelyApplied Nov 12, 2020

mlevesquedion Nov 13, 2020

PurelyApplied Nov 13, 2020

PurelyApplied Nov 12, 2020

mlevesquedion Nov 13, 2020 •

edited

Loading

PurelyApplied Nov 13, 2020

PurelyApplied Nov 13, 2020

mlevesquedion Nov 13, 2020 •

edited

Loading

PurelyApplied Nov 13, 2020

PurelyApplied Nov 13, 2020

mlevesquedion Nov 13, 2020 •

edited

Loading

PurelyApplied Nov 13, 2020

mlevesquedion Nov 13, 2020 •

edited

Loading

PurelyApplied Nov 13, 2020

mlevesquedion Nov 13, 2020

PurelyApplied left a comment

Explicitly handle each node type #178

Explicitly handle each node type #178

Conversation

mlevesquedion commented Nov 10, 2020 • edited Loading

PurelyApplied commented Nov 10, 2020

mlevesquedion commented Nov 10, 2020 • edited Loading

PurelyApplied commented Nov 10, 2020

mlevesquedion commented Nov 10, 2020

PurelyApplied commented Nov 12, 2020

mlevesquedion commented Nov 12, 2020 • edited Loading

mlevesquedion Nov 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlevesquedion commented Nov 12, 2020

mlevesquedion Nov 12, 2020 • edited Loading

Choose a reason for hiding this comment

PurelyApplied left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlevesquedion Nov 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlevesquedion Nov 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlevesquedion Nov 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlevesquedion Nov 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlevesquedion Nov 13, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PurelyApplied left a comment

Choose a reason for hiding this comment

mlevesquedion commented Nov 10, 2020 •

edited

Loading

mlevesquedion commented Nov 10, 2020 •

edited

Loading

mlevesquedion commented Nov 12, 2020 •

edited

Loading

mlevesquedion Nov 12, 2020 •

edited

Loading

mlevesquedion Nov 12, 2020 •

edited

Loading

mlevesquedion Nov 13, 2020 •

edited

Loading

mlevesquedion Nov 13, 2020 •

edited

Loading

mlevesquedion Nov 13, 2020 •

edited

Loading

mlevesquedion Nov 13, 2020 •

edited

Loading

mlevesquedion Nov 13, 2020 •

edited

Loading