-
Notifications
You must be signed in to change notification settings - Fork 264
ARM Port
We can't easily use our x86 instrumentation:
cmp <memval>, 0xf1fdf1fd
We may have to do sthg like:
spill r0
spill r1
ldr r0, <memval>
movw 0xf1fd, r1
movt 0xf1fd, r1
cmp r0, r1
restore r0
restore r1
The expanded-immeds do allow:
cmp r0, 0xf100f100
Or
cmp r0, 0x00fd00fd
Or
cmp r0, 0xf1f1f1f1
For Thumb, anyway.
Probably it's worth changing the pattern to avoid extra spills and instrs.
What it looks like with 0xf1f1f1f1:
+22 m4 @0x5291e120 <label>
+22 m4 @0x5291db00 f8ca c084 str %r12 -> +0x00000084(%r10)[4byte]
+26 m4 @0x5291e088 f8d3 c004 ldr +0x04(%r3)[4byte] -> %r12
+30 m4 @0x5291e03c f1bc 3ff1 cmp %r12 $0xf1f1f1f1
+34 m4 @0x5291dfa4 e7fe b.ne @0x5291e0d4[4byte]
+36 m4 @0x5291df58 de00 udf $0x00000000
+38 m4 @0x5291e0d4 <label>
+38 L3 f843 1f04 str %r1 $0x00000004 %r3 -> +0x04(%r3)[4byte] %r3
With flags save:
+12 m4 @0x550b2408 f8ca 0084 str %r0 -> +0x00000084(%r10)[4byte]
+16 m4 @0x550b2920 f3ef 8000 mrs %cpsr -> %r0
+20 m4 @0x550b2454 f8ca 0080 str %r0 -> +0x00000080(%r10)[4byte]
+24 m4 @0x550b1e98 f8d1 00e4 ldr +0x000000e4(%r1)[4byte] -> %r0
+28 m4 @0x550b24a0 f1b0 3ff1 cmp %r0 $0xf1f1f1f1
+32 m4 @0x550b231c e7fe b.ne @0x550b26fc[4byte]
+34 m4 @0x550b22d0 de00 udf $0x00000000
+36 m4 @0x550b26fc <label>
+36 L3 f8c1 20e4 str.hi %r2 -> +0x000000e4(%r1)[4byte]
+40 m4 @0x550b1bd8 f8da 0080 ldr +0x00000080(%r10)[4byte] -> %r0
+44 m4 @0x550b252c f380 8c00 msr $0x0c %r0 -> %cpsr
+48 m4 @0x550b26a8 f8da 0084 ldr +0x00000084(%r10)[4byte] -> %r0
Our scratch reg must be r0-r7 for cbnz though.
And we'd have to add an IT block for sub (but cbnz cannot be inside it).
So maybe we should only do it when the flags are live? Thus adding more complexity to the fault identification code.
But what about ARM? ARM immediates in GPR instrs are just an 8-bit value rotated: no repeating. Even the SIMD and VFP immeds aren't much help, except maybe the cmode=1111 combined with cmode=1100? Subtract one and then the other?
We could use mvn if most bits are 1's: sthg like 0xfff1ffff, but we still need to spill a reg, and if we do that we may as well use movw,movt.
Faster than a spill, though not if we have a (2nd) dead reg.
So we'd do:
sub r0, 0xf1000000
sub r0, 0x00f10000
sub r0, 0x0000f100
sub r0, 0x000000f1
cmp r0, 0 (cbnz is Thumb-only)
jne skip
udf
skip:
We could use 0xf1fdf1fd here -- but maybe simplest to still limit to single-byte for consistency w/ Thumb?
Vs the movw,movt: 2 extra instrs if reg dead, same # and no mem access if live. Can we ask drreg whether dead or not? => add drreg_is_register_dead()
However, having 2 different versions complicates the fault handling.
Double-checking the compiler doesn't have some trick:
if (argc == 0xf1fdf1fd)
return 1;
=>
gcc thumb -O3:
8372: f24f 13fd movw r3, #61949 ; 0xf1fd
8376: f2cf 13fd movt r3, #61949 ; 0xf1fd
837a: 4298 cmp r0, r3
gcc arm -O3:
8374: e30f31fd movw r3, #61949 ; 0xf1fd
8378: e34f31fd movt r3, #61949 ; 0xf1fd
837c: e1500003 cmp r0, r3
Real example:
+4 m4 @0x4f8c5be8 e58a1084 str %r1 -> +0x00000084(%r10)[4byte]
+8 m4 @0x4f8c60a8 e10f1000 mrs %cpsr -> %r1
+12 m4 @0x4f8c5c34 e58a1080 str %r1 -> +0x00000080(%r10)[4byte]
+16 m4 @0x4f8c6134 e5901000 ldr (%r0)[4byte] -> %r1
+20 m4 @0x4f8c6180 e24114f1 sub %r1 $0xf1000000 -> %r1
+24 m4 @0x4f8c5ff4 e24118f1 sub %r1 $0x00f10000 -> %r1
+28 m4 @0x4f8c5b10 e2411cf1 sub %r1 $0x0000f100 -> %r1
+32 m4 @0x4f8c5f28 e24110f1 sub %r1 $0x000000f1 -> %r1
+36 m4 @0x4f8c5f68 e3510000 cmp %r1 $0x00000000
+40 m4 @0x4f8c5b50 1afffffe b.ne @0x4f8c60f4[4byte]
+44 m4 @0x4f8c6250 e7f000f0 udf $0x00000000
+48 m4 @0x4f8c60f4 e7f000f0 <label>
+48 L3 e5900000 ldr (%r0)[4byte] -> %r0
+52 m4 @0x4f8c6290 e59a1080 ldr +0x00000080(%r10)[4byte] -> %r1
+56 m4 @0x4f8c62dc e12cf001 msr $0x0c %r1 -> %cpsr
+60 m4 @0x4f8c6334 e59a1084 ldr +0x00000084(%r10)[4byte] -> %r1
Breaks DR's rules: would mess up decode_fragment.
Instead of inlining, could jump to separate gencode (need 15 forms one for each scratch reg) -- if already in cache maybe ok that it's not local.
If TLS in data cache and have L1 hit, may be as fast as movw,movt, and it's shorter code.
Very complex w/ interactions w/ r10 though
For 2 spills, is ldm or ldrd faster? Qin's initial tests showed no faster than separate ldr x2, so even though instr density is better, if it makes drreg really complex it's prob not worth doing.