Improvement: Handle select instructions #235

mlevesquedion · 2020-12-17T19:57:46Z

This PR is a part of #188 and constitutes a deep investigation into the ssa.Select instruction.

Our current approach to Select instructions is "visit every referrer and operand". This runs the risk of producing false positives:

func TestRecvFromTaintedAndNonTaintedChans(sources <-chan *core.Source, innocs <-chan *core.Innocuous) {
	select {
	case s := <-sources:
	case i := <-innocs:
		core.Sink(innocs) // a false positive report is produced here
		core.Sink(i) // a false positive report is produced here
	}
	core.Sink(innocs) // a false positive report is produced here
}

In this case, both sources and innocs are Operands of the select instruction. Since sources is tainted, visiting the Select instruction leads to innocs being tainted, which is incorrect.

Select instructions are a bit special, in that each of the cases is wrapped in a SelectState struct that captures the direction of the operation (send or recv), the channel involved, and if applicable, the value being sent. This means that, in a sense, there are instructions "hidden" within the Select instruction. These "hidden" instructions do not appear among the other instructions in a block, so we need special handling for Select instructions.

Running against a large codebase such as Kubernetes does not error out. (See DEVELOPING.md for instructions on how to do that.)
(N/A) [ ] Appropriate changes to README are included in PR

internal/pkg/levee/propagation/propagation.go

mlevesquedion · 2020-12-17T20:04:34Z

internal/pkg/levee/propagation/propagation.go

+func (prop *Propagation) visitSelect(sel *ssa.Select, maxInstrReached map[*ssa.BasicBlock]int, lastBlockVisited *ssa.BasicBlock) {
+	// The Send operations have to be processed before the Recv operations in case a channel
+	// is written to in one branch and read from in a different branch. This matters if the
+	// branches are reachable from each other, e.g. when the select is in a loop.


This behavior is covered by existing tests, see TestTaintedAndSinkedInDifferentBranches(InLoop).

internal/pkg/levee/propagation/propagation.go

PurelyApplied · 2020-12-22T19:19:56Z

internal/pkg/levee/propagation/propagation.go

+				// Select returns a tuple whose first 2 elements are irrelevant for our
+				// analysis. The other elements map 1:1 with each of the recv states.
+				// See the docs for ssa.Extract for more details.


Something felt a little off-kilter about this docstring explaining the recvStateCount that is used in the wrapping blocks. And as mentioned elsewhere, the separation of send and recv felt a bit strange to me...

I was poking at it as I was trying to understand this PR, and I wound up with this implementation of visitSelect. What do you think of the readability of extracting this counter as a bit of preprocessing, and combing the actual assessment of each case within a single loop?

func (prop *Propagation) visitSelect(sel *ssa.Select, maxInstrReached map[*ssa.BasicBlock]int, lastBlockVisited *ssa.BasicBlock) { // Comment about extracted tuple and why count starts at 2. Reference docs. // This could also be extracted as a separate method if that would further improve readability var recvInds = map[int]int{} count := 2 for i, s := range sel.States { if s.Dir == types.RecvOnly { recvInds[i] = count count++ } } for i, s := range sel.States { switch { // High-level comment about tainting chans case s.Dir == types.SendOnly && prop.marked[s.Send.(ssa.Node)]: prop.dfs(s.Chan.(ssa.Node), maxInstrReached, lastBlockVisited, false) // High-level comment about recieving tainted value via chans case s.Dir == types.RecvOnly && prop.marked[s.Chan.(ssa.Node)]: if sel.Referrers() == nil { continue } // Implementation comment about extraction referrer for _, r := range *sel.Referrers() { e, ok := r.(*ssa.Extract) if !ok || e.Index != recvInds[i] { continue } prop.dfs(e, maxInstrReached, lastBlockVisited, false) } } } }

It puts the nil-safety check nearer its usage, too, so even if it ends up being unreachable in practice, your argument for keeping it there as safety is more apparent in the implementation.

Thoughts?

I do like having the two cases in the same loop + switch. My perception that the sends have to be visited before the recvs may have been caused by the earlier count++ bug that is now fixed.

I also like the idea of computing the indices in advance. I took a slightly different approach w.r.t. to the implementation details. Starting the count at 2 doesn't make sense to me.

Thank you for your suggestions!

PurelyApplied · 2020-12-22T19:21:23Z

internal/pkg/levee/testdata/src/example.com/tests/select/tests.go

+	core.Sink(i2)   // TODO(#211) want "a source has reached a sink"
+	core.Sink(i3)   // TODO(#211) want "a source has reached a sink"
+	core.Sink(i4)   // TODO(#211) want "a source has reached a sink"


I maybe misunderstood the intent, since i4 was previously emitting diagnostic. Reading #211, the objective would be this current behavior where i2, i3, i4 are not considered tainted, since that would be outside the scope of this once select. So these shouldn't be TODOs, since they're currently behaving as desired.

Am I misunderstanding #211 here?

If the select were within a for loop, then they should emit. (Testing locally, indeed they do.) That test-case could be added here as well.

You're right, this is different from #211.

I've added the for loop test case.

internal/pkg/levee/propagation/propagation.go

PurelyApplied

LGTM.

PurelyApplied · 2020-12-22T21:11:48Z

internal/pkg/levee/propagation/propagation.go

+			recvCount++
+			extractIndex[ss] = recvCount + 1


The comment above feels like it's working to coerce 0-indexing into 1-indexing. Consider flipping these instructions, i.e.

extractIndex[ss] = recvCount + 2 recvCount++

and using 0-indexing throughout.

The comment could be massaged at that point to something more like what you had prior. "Select returns a tuple whose first two elements are unused for our analysis. The remainder of the tuple corresponds one-to-one to channels in the Recv state."

I considered doing that, but I'm not sure it's more readable. On the one hand, not having the "2" anywhere in the code feels confusing. On the other hand, writing ... = recvCount + 2 feels like it's incorrect because it's using an outdated value of recvCount.

That being said, I tried changing recvCount to recvIndex and I think that this naming change, combined with your proposal, is what makes the most sense.

Thank you for your valuable input!

Handle select instructions

80b1a32

mlevesquedion commented Dec 17, 2020

View reviewed changes

internal/pkg/levee/propagation/propagation.go Show resolved Hide resolved

mlevesquedion commented Dec 17, 2020

View reviewed changes

mlevesquedion requested review from PurelyApplied, vinayakankugoyal and benhxy December 17, 2020 20:07

mlevesquedion mentioned this pull request Dec 17, 2020

Determine correct propagation behavior for ambiguous cases #188

Closed

8 tasks

PurelyApplied reviewed Dec 22, 2020

View reviewed changes

internal/pkg/levee/propagation/propagation.go Outdated Show resolved Hide resolved

internal/pkg/levee/propagation/propagation.go Outdated Show resolved Hide resolved

internal/pkg/levee/propagation/propagation.go Outdated Show resolved Hide resolved

Michaël Lévesque-Dion added 2 commits December 22, 2020 11:02

address review comments + tag TODOs with issue

c150eb8

Merge branch 'master' into handle-select-instructions

04a277a

PurelyApplied reviewed Dec 22, 2020

View reviewed changes

address review comments

326f159

PurelyApplied approved these changes Dec 22, 2020

View reviewed changes

Michaël Lévesque-Dion added 2 commits December 22, 2020 16:13

merge master + fix conflict

28d6da7

clarify code + comment for mapping between extracts and recvs

d73ed05

mlevesquedion merged commit c40e404 into google:master Dec 22, 2020

mlevesquedion deleted the handle-select-instructions branch December 22, 2020 21:30

mlevesquedion changed the title ~~Handle select instructions~~ Improvement: Handle select instructions Feb 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvement: Handle select instructions #235

Improvement: Handle select instructions #235

mlevesquedion commented Dec 17, 2020 •

edited

Loading

mlevesquedion Dec 17, 2020

PurelyApplied Dec 22, 2020

mlevesquedion Dec 22, 2020 •

edited

Loading

PurelyApplied Dec 22, 2020

mlevesquedion Dec 22, 2020

PurelyApplied left a comment

PurelyApplied Dec 22, 2020

mlevesquedion Dec 22, 2020

Improvement: Handle select instructions #235

Improvement: Handle select instructions #235

Conversation

mlevesquedion commented Dec 17, 2020 • edited Loading

mlevesquedion Dec 17, 2020

Choose a reason for hiding this comment

PurelyApplied Dec 22, 2020

Choose a reason for hiding this comment

mlevesquedion Dec 22, 2020 • edited Loading

Choose a reason for hiding this comment

PurelyApplied Dec 22, 2020

Choose a reason for hiding this comment

mlevesquedion Dec 22, 2020

Choose a reason for hiding this comment

PurelyApplied left a comment

Choose a reason for hiding this comment

PurelyApplied Dec 22, 2020

Choose a reason for hiding this comment

mlevesquedion Dec 22, 2020

Choose a reason for hiding this comment

mlevesquedion commented Dec 17, 2020 •

edited

Loading

mlevesquedion Dec 22, 2020 •

edited

Loading