-
Notifications
You must be signed in to change notification settings - Fork 32
PerformanceTodo
== Sparc Backend ==
This is pretty much Will's department. Though we of course should be inspired by the peepholes that are currently implemented on Sparc as TODO's for the other backends.
-
Did Lars consider using the segment registers in "special ways"?
- Are those even available to us?
- PnkFelix was specifically wondering if it might be worthwhile to put GLOBALS and/or R0 into a segment register.
- All of the segment registers are 16 bits, which poses a problem.
- Putting GLOBALS in a segment register still might work (and even be a good idea?), assuming we massage the interfacing with the C code accordingly.
- LarsHansen says:
- avoid the segment registers until you have ascertained that accessing them has good performance on current processor models. My 386 manual shows protected-mode moves to the segment registers as being 4-9 times slower than moves to other registers. Reading from those registers is fast (again for the 386) so something that's basically constant might work, like GLOBALS.
-
W.r.t. how to do setrtn/invoke and whether to layer it on top of call, JonathanKraut mentions: Another option in this situation, of course, is to just do a manual push and jump:
(push return_address) (jmp the_func) (align 4) (label return_address)
This way, there are no nop's to decode (probably a minor issue anyway as these are "executed" at 4/cycle, basically), but more importantly, especially if you never use the (ret) instruction to invoke the_func's continuation - this doesn't mess with the x86's call-return cache and possibly cause some expensive branch mis-predictions.
-
IntelRegistersTodo
-
IntelPeepholeTodo
-
Jonathan Kraut pointed me at the following references for optimization.
- http://www.agner.org/optimize/
- The instruction_tables.pdf link that has the micro-op counts for every instruction may be especially interesting.
-
IntelsOptimizationRules
What PowerPC backend?