-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvement: Handle select instructions #235
Improvement: Handle select instructions #235
Conversation
func (prop *Propagation) visitSelect(sel *ssa.Select, maxInstrReached map[*ssa.BasicBlock]int, lastBlockVisited *ssa.BasicBlock) { | ||
// The Send operations have to be processed before the Recv operations in case a channel | ||
// is written to in one branch and read from in a different branch. This matters if the | ||
// branches are reachable from each other, e.g. when the select is in a loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior is covered by existing tests, see TestTaintedAndSinkedInDifferentBranches(InLoop)
.
// Select returns a tuple whose first 2 elements are irrelevant for our | ||
// analysis. The other elements map 1:1 with each of the recv states. | ||
// See the docs for ssa.Extract for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something felt a little off-kilter about this docstring explaining the recvStateCount
that is used in the wrapping blocks. And as mentioned elsewhere, the separation of send and recv felt a bit strange to me...
I was poking at it as I was trying to understand this PR, and I wound up with this implementation of visitSelect
. What do you think of the readability of extracting this counter as a bit of preprocessing, and combing the actual assessment of each case
within a single loop?
func (prop *Propagation) visitSelect(sel *ssa.Select, maxInstrReached map[*ssa.BasicBlock]int, lastBlockVisited *ssa.BasicBlock) {
// Comment about extracted tuple and why count starts at 2. Reference docs.
// This could also be extracted as a separate method if that would further improve readability
var recvInds = map[int]int{}
count := 2
for i, s := range sel.States {
if s.Dir == types.RecvOnly {
recvInds[i] = count
count++
}
}
for i, s := range sel.States {
switch {
// High-level comment about tainting chans
case s.Dir == types.SendOnly && prop.marked[s.Send.(ssa.Node)]:
prop.dfs(s.Chan.(ssa.Node), maxInstrReached, lastBlockVisited, false)
// High-level comment about recieving tainted value via chans
case s.Dir == types.RecvOnly && prop.marked[s.Chan.(ssa.Node)]:
if sel.Referrers() == nil {
continue
}
// Implementation comment about extraction referrer
for _, r := range *sel.Referrers() {
e, ok := r.(*ssa.Extract)
if !ok || e.Index != recvInds[i] {
continue
}
prop.dfs(e, maxInstrReached, lastBlockVisited, false)
}
}
}
}
It puts the nil-safety check nearer its usage, too, so even if it ends up being unreachable in practice, your argument for keeping it there as safety is more apparent in the implementation.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like having the two cases in the same loop + switch. My perception that the sends have to be visited before the recvs may have been caused by the earlier count++
bug that is now fixed.
I also like the idea of computing the indices in advance. I took a slightly different approach w.r.t. to the implementation details. Starting the count at 2 doesn't make sense to me.
Thank you for your suggestions!
core.Sink(i2) // TODO(#211) want "a source has reached a sink" | ||
core.Sink(i3) // TODO(#211) want "a source has reached a sink" | ||
core.Sink(i4) // TODO(#211) want "a source has reached a sink" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I maybe misunderstood the intent, since i4
was previously emitting diagnostic. Reading #211, the objective would be this current behavior where i2, i3, i4
are not considered tainted, since that would be outside the scope of this once select. So these shouldn't be TODO
s, since they're currently behaving as desired.
Am I misunderstanding #211 here?
If the select
were within a for
loop, then they should emit. (Testing locally, indeed they do.) That test-case could be added here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this is different from #211.
I've added the for
loop test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
recvCount++ | ||
extractIndex[ss] = recvCount + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment above feels like it's working to coerce 0-indexing into 1-indexing. Consider flipping these instructions, i.e.
extractIndex[ss] = recvCount + 2
recvCount++
and using 0-indexing throughout.
The comment could be massaged at that point to something more like what you had prior. "Select returns a tuple whose first two elements are unused for our analysis. The remainder of the tuple corresponds one-to-one to channels in the Recv state."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered doing that, but I'm not sure it's more readable. On the one hand, not having the "2" anywhere in the code feels confusing. On the other hand, writing ... = recvCount + 2
feels like it's incorrect because it's using an outdated value of recvCount
.
That being said, I tried changing recvCount
to recvIndex
and I think that this naming change, combined with your proposal, is what makes the most sense.
Thank you for your valuable input!
This PR is a part of #188 and constitutes a deep investigation into the
ssa.Select
instruction.Our current approach to
Select
instructions is "visit every referrer and operand". This runs the risk of producing false positives:In this case, both
sources
andinnocs
areOperands
of theselect
instruction. Sincesources
is tainted, visiting theSelect
instruction leads toinnocs
being tainted, which is incorrect.Select
instructions are a bit special, in that each of the cases is wrapped in aSelectState
struct that captures the direction of the operation (send or recv), the channel involved, and if applicable, the value being sent. This means that, in a sense, there are instructions "hidden" within theSelect
instruction. These "hidden" instructions do not appear among the other instructions in a block, so we need special handling forSelect
instructions.