Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement: Handle select instructions #235

Merged
merged 6 commits into from
Dec 22, 2020

Conversation

mlevesquedion
Copy link
Contributor

@mlevesquedion mlevesquedion commented Dec 17, 2020

This PR is a part of #188 and constitutes a deep investigation into the ssa.Select instruction.

Our current approach to Select instructions is "visit every referrer and operand". This runs the risk of producing false positives:

func TestRecvFromTaintedAndNonTaintedChans(sources <-chan *core.Source, innocs <-chan *core.Innocuous) {
	select {
	case s := <-sources:
	case i := <-innocs:
		core.Sink(innocs) // a false positive report is produced here
		core.Sink(i) // a false positive report is produced here
	}
	core.Sink(innocs) // a false positive report is produced here
}

In this case, both sources and innocs are Operands of the select instruction. Since sources is tainted, visiting the Select instruction leads to innocs being tainted, which is incorrect.

Select instructions are a bit special, in that each of the cases is wrapped in a SelectState struct that captures the direction of the operation (send or recv), the channel involved, and if applicable, the value being sent. This means that, in a sense, there are instructions "hidden" within the Select instruction. These "hidden" instructions do not appear among the other instructions in a block, so we need special handling for Select instructions.

  • Running against a large codebase such as Kubernetes does not error out. (See DEVELOPING.md for instructions on how to do that.)
  • (N/A) [ ] Appropriate changes to README are included in PR

func (prop *Propagation) visitSelect(sel *ssa.Select, maxInstrReached map[*ssa.BasicBlock]int, lastBlockVisited *ssa.BasicBlock) {
// The Send operations have to be processed before the Recv operations in case a channel
// is written to in one branch and read from in a different branch. This matters if the
// branches are reachable from each other, e.g. when the select is in a loop.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is covered by existing tests, see TestTaintedAndSinkedInDifferentBranches(InLoop).

Comment on lines 282 to 284
// Select returns a tuple whose first 2 elements are irrelevant for our
// analysis. The other elements map 1:1 with each of the recv states.
// See the docs for ssa.Extract for more details.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something felt a little off-kilter about this docstring explaining the recvStateCount that is used in the wrapping blocks. And as mentioned elsewhere, the separation of send and recv felt a bit strange to me...

I was poking at it as I was trying to understand this PR, and I wound up with this implementation of visitSelect. What do you think of the readability of extracting this counter as a bit of preprocessing, and combing the actual assessment of each case within a single loop?

func (prop *Propagation) visitSelect(sel *ssa.Select, maxInstrReached map[*ssa.BasicBlock]int, lastBlockVisited *ssa.BasicBlock) {
	// Comment about extracted tuple and why count starts at 2.  Reference docs.
	// This could also be extracted as a separate method if that would further improve readability
	var recvInds = map[int]int{}
	count := 2
	for i, s := range sel.States {
		if s.Dir == types.RecvOnly {
			recvInds[i] = count
			count++
		}
	}

	for i, s := range sel.States {
		switch {
		// High-level comment about tainting chans
		case s.Dir == types.SendOnly && prop.marked[s.Send.(ssa.Node)]:
			prop.dfs(s.Chan.(ssa.Node), maxInstrReached, lastBlockVisited, false)

		// High-level comment about recieving tainted value via chans
		case s.Dir == types.RecvOnly && prop.marked[s.Chan.(ssa.Node)]:
			if sel.Referrers() == nil {
				continue
			}
			// Implementation comment about extraction referrer
			for _, r := range *sel.Referrers() {
				e, ok := r.(*ssa.Extract)
				if !ok || e.Index != recvInds[i] {
					continue
				}
				prop.dfs(e, maxInstrReached, lastBlockVisited, false)
			}
		}
	}
}

It puts the nil-safety check nearer its usage, too, so even if it ends up being unreachable in practice, your argument for keeping it there as safety is more apparent in the implementation.

Thoughts?

Copy link
Contributor Author

@mlevesquedion mlevesquedion Dec 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like having the two cases in the same loop + switch. My perception that the sends have to be visited before the recvs may have been caused by the earlier count++ bug that is now fixed.

I also like the idea of computing the indices in advance. I took a slightly different approach w.r.t. to the implementation details. Starting the count at 2 doesn't make sense to me.

Thank you for your suggestions!

Comment on lines 163 to 165
core.Sink(i2) // TODO(#211) want "a source has reached a sink"
core.Sink(i3) // TODO(#211) want "a source has reached a sink"
core.Sink(i4) // TODO(#211) want "a source has reached a sink"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I maybe misunderstood the intent, since i4 was previously emitting diagnostic. Reading #211, the objective would be this current behavior where i2, i3, i4 are not considered tainted, since that would be outside the scope of this once select. So these shouldn't be TODOs, since they're currently behaving as desired.

Am I misunderstanding #211 here?

If the select were within a for loop, then they should emit. (Testing locally, indeed they do.) That test-case could be added here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this is different from #211.

I've added the for loop test case.

internal/pkg/levee/propagation/propagation.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@PurelyApplied PurelyApplied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Comment on lines 266 to 267
recvCount++
extractIndex[ss] = recvCount + 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above feels like it's working to coerce 0-indexing into 1-indexing. Consider flipping these instructions, i.e.

extractIndex[ss] = recvCount + 2
recvCount++

and using 0-indexing throughout.

The comment could be massaged at that point to something more like what you had prior. "Select returns a tuple whose first two elements are unused for our analysis. The remainder of the tuple corresponds one-to-one to channels in the Recv state."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered doing that, but I'm not sure it's more readable. On the one hand, not having the "2" anywhere in the code feels confusing. On the other hand, writing ... = recvCount + 2 feels like it's incorrect because it's using an outdated value of recvCount.

That being said, I tried changing recvCount to recvIndex and I think that this naming change, combined with your proposal, is what makes the most sense.

Thank you for your valuable input!

@mlevesquedion mlevesquedion merged commit c40e404 into google:master Dec 22, 2020
@mlevesquedion mlevesquedion deleted the handle-select-instructions branch December 22, 2020 21:30
@mlevesquedion mlevesquedion changed the title Handle select instructions Improvement: Handle select instructions Feb 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants