Division and modulo by power of 2 don't get picked up on MWCC 1.3.2 and higher #276

dbalatoni13 · 2024-05-28T19:09:40Z

Using MWCC 1.3.2, these operations get compiled differently (also depending on the optimization level) and don't get recognized by m2c.
On -O0,p division by 2 looks like (on -O0 it uses addze instead)

rlwinm $x, $s, 1, 31, 31
add $y, $x, $s
srawi $d, $y, 1

Division by power of 2 looks like

srawi $x, $s, N
addze $d, $x

Modulo by 2 looks like

rlwinm $x, $s, 1, 31, 31
rlwinm $y, $s, 0, 31, 31
xor $z, $y, $x
subf $d, $x, $z

Modulo by power of 2 looks like

rlwinm $x, $s, N, 0, (31 - N)
rlwinm $y, $s, 1, 31, 31
subf $x, $y, $x
rlwinm $x, $x, (32 - N), 0, 31
add $d, $x, $s

Is this something we want to support? If yes, do we want to do it using an assembly pattern or on a higher level? Is there a way we can add different tests for different compiler versions?

The text was updated successfully, but these errors were encountered:

simonlindholm · 2024-05-28T20:08:32Z

Is this something we want to support?

Yes!

If yes, do we want to do it using an assembly pattern or on a higher level?

Depends on how complicated it would be to do it on a higher level; if it's easy then it could generalize better across compilers/compiler versions.

Assembly patterns have the downside of breaking when the instructions aren't next to each other. For some compilers this means they aren't very reliable (but they are easy to write and can handle branches which other pattern matching subsystems can't). IR patterns are better in this regard and look more or less the same.

Is there a way we can add different tests for different compiler versions?

Sure, just add the asm in some file in tests/end_to_end/, with a -flags.txt file that indicates ppc. They can go in the same directory (I think we have an existing division-by-power-of-two one?) with filenames denoting compiler.

simonlindholm · 2024-05-28T20:20:06Z

Happy to take the tests even without a fix, btw

dbalatoni13 · 2024-05-28T20:43:10Z

Depends on how complicated it would be to do it on a higher level; if it's easy then it could generalize better across compilers/compiler versions.

I already made a primitive fix with the cases I listed here. I will check them on all MWCC versions I have and look for patterns in the asm and the higher level interpretation.

Sure, just add the asm in some file in tests/end_to_end/, with a -flags.txt file that indicates ppc. They can go in the same directory (I think we have an existing division-by-power-of-two one?) with filenames denoting compiler.

Yes, we already have a test for it, I might extend it because division by 2 is different from power of 2 in some versions. I might pick a candidate for every version that generates a different asm.

dbalatoni13 · 2024-05-28T20:46:36Z

For low level replacements we have ASM and IR patterns, what do we have for higher level simplifications?

simonlindholm · 2024-05-28T21:36:30Z

For the higher level we have fold_divmod in evaluate.py. Maybe that's just for non-powers of two though, I don't remember. And it looks like it might be complex enough as it is and not worth bolting even more stuff onto :)

dbalatoni13 · 2024-05-29T00:26:29Z

I checked 255 different combinations of optimization levels and compiler versions for division by two. The problem only exists in version 1.3.2. When the output is not simplified, it's always ((s32) (((u32) *arg0 >> 0x1FU) + *arg0) >> 1) and (at least in simple cases) the asm always looks like what I wrote initially. Which replacement would make more sense in this case?

dbalatoni13 · 2024-05-29T02:26:19Z

I tried adding

if (
    expr.op == ">>"
    and isinstance(left_expr, BinaryOp)
    and left_expr.op == "+"
    and isinstance(left_expr.left, Expression)
):
    left_left_expr = early_unwrap_ints(left_expr.left)
    if (
        isinstance(left_left_expr, BinaryOp)
        and left_left_expr.op == ">>"
        and early_unwrap_ints(left_left_expr.right) == Literal(31)
        and early_unwrap_ints(right_expr) == Literal(1)
    ):
        return BinaryOp.int(left=left_expr.right, op="/", right=Literal(2))

to the divmod function and it seems to be working fine.
The assembly pattern also works welI in my testing.

I realized that I was wrong and dividing by a power of 2 that is not 2 actually works out of the box.

I'm looking forward to your opinion.

Another finding is that my mod power 2 pattern doesn't work for the values 8 and 16 for -O0 in the test gcc-division, those get turned into

temp_r4 = (u32) d >> 0x1FU;
temp_r3 = (d << 0x1D) - temp_r4;
((temp_r3 << 3) | (temp_r3 >> 0x1DU)) + temp_r4;

which is quite complicated and quite an edge case. I noticed multiple weird problems there that don't happen normally, like infering u32 instead of s32 when doing division by a power of two

simonlindholm · 2024-05-29T18:51:09Z

Adding that to divmod makes sense to me, since it looks like a simple enough pattern that I could see coming up with other compilers/arches.

I think you don't need the early_unwrap_ints(...) when comparing things to literals, literals should never have Casts around them. Also no need for and isinstance(left_expr.left, Expression), and in the return statement you may want to cast left_expr.right to a signed integer, e.g. via as_sintish. And maybe handle both a + b and b + a (i.e., do a,b = b,a if a is a Literal).

which is quite complicated and quite an edge case

Division pattern matching is so full of them!

I noticed multiple weird problems there that don't happen normally, like infering u32 instead of s32 when doing division by a power of two

Yes, some of the division stuff isn't working too well with ppc, it hasn't really received all the love it deserves.

simonlindholm · 2024-05-29T18:52:38Z

oh, and don't you need to check that left_left_expr.left matches left_expr.right?

dbalatoni13 · 2024-05-29T19:16:48Z

I will try to implement your suggestions.

oh, and don't you need to check that left_left_expr.left matches left_expr.right?

I wanted to do that but they didn't match because of the sign and I wasn't sure how to cast. Will implement that too.

Would you like me to later add the aforementioned mod 2 into divmod?
Mod 2 gets decompiled to:
((x & 1) ^ (x >> 31))) - (x >> 31)
Mod by power of 2 gets compiled to:
((((x << 30) - ((u32) x >> 31)) << 2) | (((x << 30) - ((u32) x >> 31)) >> 30)) + ((u32) x >> 31)

I don't know how much of a good idea it is to implement the latter one programatically instead of an asm pattern.

simonlindholm · 2024-05-29T19:49:04Z

Wow, that looks awful. An asm pattern sounds good there if it works, probably for both cases. early_unwrap_ints ought to get rid of the cast and make them compare equal.

dbalatoni13 · 2024-05-30T01:44:06Z

Unfortunately the registers used in asm seem to be jambled a lot in mod by power two, I can't find a pattern that matches all cases, meanwhile the final expression seems to be consistent.
These are the asm outputs (the order of registers probably doesn't really depend on the value itself, just other compiler factors):
Mod 4:

slwi r0, r3, 0x1e
srwi r3, r3, 0x1f
subf r0, r3, r0
rotlwi r0, r0, 2
add r3, r0, r3

Mod 8:

slwi r5, r3, 0x1d
srwi r4, r3, 0x1f
subf r3, r4, r5
rotlwi r0, r3, 3
add r3, r0, r4

Mod 16:

slwi r0, r5, 0x1c
srwi r4, r5, 0x1f
subf r3, r4, r0
rotlwi r0, r3, 4
add r3, r0, r4

Mod 32:

slwi r0, r3, 0x1b
srwi r3, r3, 0x1f
subf r0, r3, r0
rotlwi r0, r0, 5
add r3, r0, r3

These are the patterns I tried, but they are not universal:

pattern = make_pattern(
    "rlwinm $x, $s, N, 0, (31 - N)",
    "rlwinm $y, $s, 1, 31, 31",
    "subf $x, $y, $x",
    "rlwinm $x, $x, (32 - N), 0, 31",
    "add $d, $x, $s",
)

pattern = make_pattern(
    "rlwinm $x, $s, N, 0, (31 - N)",
    "rlwinm $y, $s, 1, 31, 31",
    "subf $d, $y, $x",
    "rlwinm $a, $d, (32 - N), 0, 31",
    "add $d, $a, $y",
)

dbalatoni13 · 2024-05-30T02:00:31Z

And maybe handle both a + b and b + a (i.e., do a,b = b,a if a is a Literal).

I unfortunately don't quite get what you mean by that.

simonlindholm · 2024-05-30T18:19:21Z

Oh, oops, that was just wrong of me, I was thinking of the pattern wrong. I guess you just have to test both variants.
E.g. for (lhs, rhs) in ((add.left, add.right), (add.right, add.left)): ...

These are the patterns I tried, but they are not universal:

Those look reasonable to me; what goes wrong with them?

dbalatoni13 · 2024-06-02T23:08:17Z

Those look reasonable to me; what goes wrong with them?

For example if you look at the d variable, it doesn't quiet match up for all cases

simonlindholm · 2024-06-03T20:39:49Z

Ah, I see, I looked too hastily clearly. The pattern matching system allows multiple register variables to get assigned the same actual register, so the last pattern should be fine as long as you different destination regs for the third and fifth instructions. (With some manual guards against some registers not being equal for fuller correctness -- this is also something that IR patterns don't need. If you want to try using an IR pattern, beware that they are matched backward, so you'd need a variable for 32 - N and then use that to rewrite the other math.)

dbalatoni13 changed the title ~~Division and modulo by power of 2 don't get picked up on -O0 and -O0,p~~ Division and modulo by power of 2 don't get picked up on MWCC 1.3.2 and higher May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Division and modulo by power of 2 don't get picked up on MWCC 1.3.2 and higher #276

Division and modulo by power of 2 don't get picked up on MWCC 1.3.2 and higher #276

dbalatoni13 commented May 28, 2024 •

edited

Loading

simonlindholm commented May 28, 2024

simonlindholm commented May 28, 2024

dbalatoni13 commented May 28, 2024 •

edited

Loading

dbalatoni13 commented May 28, 2024

simonlindholm commented May 28, 2024

dbalatoni13 commented May 29, 2024 •

edited

Loading

dbalatoni13 commented May 29, 2024 •

edited

Loading

simonlindholm commented May 29, 2024

simonlindholm commented May 29, 2024

dbalatoni13 commented May 29, 2024 •

edited

Loading

simonlindholm commented May 29, 2024

dbalatoni13 commented May 30, 2024 •

edited

Loading

dbalatoni13 commented May 30, 2024

simonlindholm commented May 30, 2024

dbalatoni13 commented Jun 2, 2024

simonlindholm commented Jun 3, 2024

Division and modulo by power of 2 don't get picked up on MWCC 1.3.2 and higher #276

Division and modulo by power of 2 don't get picked up on MWCC 1.3.2 and higher #276

Comments

dbalatoni13 commented May 28, 2024 • edited Loading

simonlindholm commented May 28, 2024

simonlindholm commented May 28, 2024

dbalatoni13 commented May 28, 2024 • edited Loading

dbalatoni13 commented May 28, 2024

simonlindholm commented May 28, 2024

dbalatoni13 commented May 29, 2024 • edited Loading

dbalatoni13 commented May 29, 2024 • edited Loading

simonlindholm commented May 29, 2024

simonlindholm commented May 29, 2024

dbalatoni13 commented May 29, 2024 • edited Loading

simonlindholm commented May 29, 2024

dbalatoni13 commented May 30, 2024 • edited Loading

dbalatoni13 commented May 30, 2024

simonlindholm commented May 30, 2024

dbalatoni13 commented Jun 2, 2024

simonlindholm commented Jun 3, 2024

dbalatoni13 commented May 28, 2024 •

edited

Loading

dbalatoni13 commented May 28, 2024 •

edited

Loading

dbalatoni13 commented May 29, 2024 •

edited

Loading

dbalatoni13 commented May 29, 2024 •

edited

Loading

dbalatoni13 commented May 29, 2024 •

edited

Loading

dbalatoni13 commented May 30, 2024 •

edited

Loading