-
Notifications
You must be signed in to change notification settings - Fork 32
IAssassinSetrtnInvoke
Felix S. Klock II edited this page Jul 28, 2013
·
1 revision
PnkFelix made an experimental peephole for optimizing the instruction sequence:
setrtn L
invoke n
.align 4
L:
to just
setrtn/invoke n
.align 4
L:
- He didn't think it would be possible when he first looked at it to use the x86
call
instruction to implement this, because thecall
instruction pushes the address onto the stack- (on Sparc,
setrtn/invoke
uses the delay slot in a clever way to get around this. Or at least maybe it does; the peephole has been disabled it seems...)
- (on Sparc,
- However, Will pointed out to PnkFelix that one can get around this problem by introducing a level of indirection: don't call directly to the target, but call to a short instruction sequence that stores the return address and then jumps to the target.
- The problem here became "where do I put this short instruction sequence?"
PnkFelix decided to put the instruction sequence at the end of the bytevector for the code segment. We don't always put it there; only if we actually make a setrtn/invoke call during the assembly.
- However, the way things work out, this means that we have cases where the non-peepholed version generates smaller code than the peephole'd version. Namely, if we only have one occurrence of the above pattern, then the peepholed version occupies 8 bytes more than the non-peepholed verison.
- Here are the actual equations for calculating the expected instruction size, where K is the number of occurences of the above pattern
- nonpeep: K*52
- peep: K*40+20
- Here are the actual equations for calculating the expected instruction size, where K is the number of occurences of the above pattern
- So it might be worth revisiting this choice, perhaps finding some way to conditionally assemble the
setrtn/invoke
based on how many we expect to enounter in the code segment.