Skip to content

IAssassinSetrtnInvoke

Felix S. Klock II edited this page Jul 28, 2013 · 1 revision

PnkFelix made an experimental peephole for optimizing the instruction sequence:


setrtn L
invoke n
.align 4
L:

to just


setrtn/invoke n
.align 4
L:
  • He didn't think it would be possible when he first looked at it to use the x86 call instruction to implement this, because the call instruction pushes the address onto the stack
    • (on Sparc, setrtn/invoke uses the delay slot in a clever way to get around this. Or at least maybe it does; the peephole has been disabled it seems...)
  • However, Will pointed out to PnkFelix that one can get around this problem by introducing a level of indirection: don't call directly to the target, but call to a short instruction sequence that stores the return address and then jumps to the target.
    • The problem here became "where do I put this short instruction sequence?"

PnkFelix decided to put the instruction sequence at the end of the bytevector for the code segment. We don't always put it there; only if we actually make a setrtn/invoke call during the assembly.

  • However, the way things work out, this means that we have cases where the non-peepholed version generates smaller code than the peephole'd version. Namely, if we only have one occurrence of the above pattern, then the peepholed version occupies 8 bytes more than the non-peepholed verison.
    • Here are the actual equations for calculating the expected instruction size, where K is the number of occurences of the above pattern
      • nonpeep: K*52
      • peep: K*40+20
  • So it might be worth revisiting this choice, perhaps finding some way to conditionally assemble the setrtn/invoke based on how many we expect to enounter in the code segment.
Clone this wiki locally