Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Division and modulo by power of 2 don't get picked up on MWCC 1.3.2 and higher #276

Open
dbalatoni13 opened this issue May 28, 2024 · 16 comments

Comments

@dbalatoni13
Copy link
Contributor

dbalatoni13 commented May 28, 2024

Using MWCC 1.3.2, these operations get compiled differently (also depending on the optimization level) and don't get recognized by m2c.
On -O0,p division by 2 looks like (on -O0 it uses addze instead)

rlwinm $x, $s, 1, 31, 31
add $y, $x, $s
srawi $d, $y, 1

Division by power of 2 looks like

srawi $x, $s, N
addze $d, $x

Modulo by 2 looks like

rlwinm $x, $s, 1, 31, 31
rlwinm $y, $s, 0, 31, 31
xor $z, $y, $x
subf $d, $x, $z

Modulo by power of 2 looks like

rlwinm $x, $s, N, 0, (31 - N)
rlwinm $y, $s, 1, 31, 31
subf $x, $y, $x
rlwinm $x, $x, (32 - N), 0, 31
add $d, $x, $s

Is this something we want to support? If yes, do we want to do it using an assembly pattern or on a higher level? Is there a way we can add different tests for different compiler versions?

@dbalatoni13 dbalatoni13 changed the title Division and modulo by power of 2 don't get picked up on -O0 and -O0,p Division and modulo by power of 2 don't get picked up on MWCC 1.3.2 and higher May 28, 2024
@simonlindholm
Copy link
Collaborator

Is this something we want to support?

Yes!

If yes, do we want to do it using an assembly pattern or on a higher level?

Depends on how complicated it would be to do it on a higher level; if it's easy then it could generalize better across compilers/compiler versions.

Assembly patterns have the downside of breaking when the instructions aren't next to each other. For some compilers this means they aren't very reliable (but they are easy to write and can handle branches which other pattern matching subsystems can't). IR patterns are better in this regard and look more or less the same.

Is there a way we can add different tests for different compiler versions?

Sure, just add the asm in some file in tests/end_to_end/, with a -flags.txt file that indicates ppc. They can go in the same directory (I think we have an existing division-by-power-of-two one?) with filenames denoting compiler.

@simonlindholm
Copy link
Collaborator

Happy to take the tests even without a fix, btw

@dbalatoni13
Copy link
Contributor Author

dbalatoni13 commented May 28, 2024

Depends on how complicated it would be to do it on a higher level; if it's easy then it could generalize better across compilers/compiler versions.

I already made a primitive fix with the cases I listed here. I will check them on all MWCC versions I have and look for patterns in the asm and the higher level interpretation.

Sure, just add the asm in some file in tests/end_to_end/, with a -flags.txt file that indicates ppc. They can go in the same directory (I think we have an existing division-by-power-of-two one?) with filenames denoting compiler.

Yes, we already have a test for it, I might extend it because division by 2 is different from power of 2 in some versions. I might pick a candidate for every version that generates a different asm.

@dbalatoni13
Copy link
Contributor Author

For low level replacements we have ASM and IR patterns, what do we have for higher level simplifications?

@simonlindholm
Copy link
Collaborator

For the higher level we have fold_divmod in evaluate.py. Maybe that's just for non-powers of two though, I don't remember. And it looks like it might be complex enough as it is and not worth bolting even more stuff onto :)

@dbalatoni13
Copy link
Contributor Author

dbalatoni13 commented May 29, 2024

I checked 255 different combinations of optimization levels and compiler versions for division by two. The problem only exists in version 1.3.2. When the output is not simplified, it's always ((s32) (((u32) *arg0 >> 0x1FU) + *arg0) >> 1) and (at least in simple cases) the asm always looks like what I wrote initially. Which replacement would make more sense in this case?

@dbalatoni13
Copy link
Contributor Author

dbalatoni13 commented May 29, 2024

I tried adding

if (
    expr.op == ">>"
    and isinstance(left_expr, BinaryOp)
    and left_expr.op == "+"
    and isinstance(left_expr.left, Expression)
):
    left_left_expr = early_unwrap_ints(left_expr.left)
    if (
        isinstance(left_left_expr, BinaryOp)
        and left_left_expr.op == ">>"
        and early_unwrap_ints(left_left_expr.right) == Literal(31)
        and early_unwrap_ints(right_expr) == Literal(1)
    ):
        return BinaryOp.int(left=left_expr.right, op="/", right=Literal(2))

to the divmod function and it seems to be working fine.
The assembly pattern also works welI in my testing.

I realized that I was wrong and dividing by a power of 2 that is not 2 actually works out of the box.

I'm looking forward to your opinion.

Another finding is that my mod power 2 pattern doesn't work for the values 8 and 16 for -O0 in the test gcc-division, those get turned into

temp_r4 = (u32) d >> 0x1FU;
temp_r3 = (d << 0x1D) - temp_r4;
((temp_r3 << 3) | (temp_r3 >> 0x1DU)) + temp_r4;

which is quite complicated and quite an edge case. I noticed multiple weird problems there that don't happen normally, like infering u32 instead of s32 when doing division by a power of two

@simonlindholm
Copy link
Collaborator

Adding that to divmod makes sense to me, since it looks like a simple enough pattern that I could see coming up with other compilers/arches.

I think you don't need the early_unwrap_ints(...) when comparing things to literals, literals should never have Casts around them. Also no need for and isinstance(left_expr.left, Expression), and in the return statement you may want to cast left_expr.right to a signed integer, e.g. via as_sintish. And maybe handle both a + b and b + a (i.e., do a,b = b,a if a is a Literal).

which is quite complicated and quite an edge case

Division pattern matching is so full of them!

I noticed multiple weird problems there that don't happen normally, like infering u32 instead of s32 when doing division by a power of two

Yes, some of the division stuff isn't working too well with ppc, it hasn't really received all the love it deserves.

@simonlindholm
Copy link
Collaborator

oh, and don't you need to check that left_left_expr.left matches left_expr.right?

@dbalatoni13
Copy link
Contributor Author

dbalatoni13 commented May 29, 2024

I will try to implement your suggestions.

oh, and don't you need to check that left_left_expr.left matches left_expr.right?

I wanted to do that but they didn't match because of the sign and I wasn't sure how to cast. Will implement that too.

Would you like me to later add the aforementioned mod 2 into divmod?
Mod 2 gets decompiled to:
((x & 1) ^ (x >> 31))) - (x >> 31)
Mod by power of 2 gets compiled to:
((((x << 30) - ((u32) x >> 31)) << 2) | (((x << 30) - ((u32) x >> 31)) >> 30)) + ((u32) x >> 31)

I don't know how much of a good idea it is to implement the latter one programatically instead of an asm pattern.

@simonlindholm
Copy link
Collaborator

Wow, that looks awful. An asm pattern sounds good there if it works, probably for both cases. early_unwrap_ints ought to get rid of the cast and make them compare equal.

@dbalatoni13
Copy link
Contributor Author

dbalatoni13 commented May 30, 2024

Unfortunately the registers used in asm seem to be jambled a lot in mod by power two, I can't find a pattern that matches all cases, meanwhile the final expression seems to be consistent.
These are the asm outputs (the order of registers probably doesn't really depend on the value itself, just other compiler factors):
Mod 4:

slwi r0, r3, 0x1e
srwi r3, r3, 0x1f
subf r0, r3, r0
rotlwi r0, r0, 2
add r3, r0, r3

Mod 8:

slwi r5, r3, 0x1d
srwi r4, r3, 0x1f
subf r3, r4, r5
rotlwi r0, r3, 3
add r3, r0, r4

Mod 16:

slwi r0, r5, 0x1c
srwi r4, r5, 0x1f
subf r3, r4, r0
rotlwi r0, r3, 4
add r3, r0, r4

Mod 32:

slwi r0, r3, 0x1b
srwi r3, r3, 0x1f
subf r0, r3, r0
rotlwi r0, r0, 5
add r3, r0, r3

These are the patterns I tried, but they are not universal:

pattern = make_pattern(
    "rlwinm $x, $s, N, 0, (31 - N)",
    "rlwinm $y, $s, 1, 31, 31",
    "subf $x, $y, $x",
    "rlwinm $x, $x, (32 - N), 0, 31",
    "add $d, $x, $s",
)

pattern = make_pattern(
    "rlwinm $x, $s, N, 0, (31 - N)",
    "rlwinm $y, $s, 1, 31, 31",
    "subf $d, $y, $x",
    "rlwinm $a, $d, (32 - N), 0, 31",
    "add $d, $a, $y",
)

@dbalatoni13
Copy link
Contributor Author

And maybe handle both a + b and b + a (i.e., do a,b = b,a if a is a Literal).

I unfortunately don't quite get what you mean by that.

@simonlindholm
Copy link
Collaborator

Oh, oops, that was just wrong of me, I was thinking of the pattern wrong. I guess you just have to test both variants.
E.g. for (lhs, rhs) in ((add.left, add.right), (add.right, add.left)): ...

These are the patterns I tried, but they are not universal:

Those look reasonable to me; what goes wrong with them?

@dbalatoni13
Copy link
Contributor Author

Those look reasonable to me; what goes wrong with them?

For example if you look at the d variable, it doesn't quiet match up for all cases

@simonlindholm
Copy link
Collaborator

Ah, I see, I looked too hastily clearly. The pattern matching system allows multiple register variables to get assigned the same actual register, so the last pattern should be fine as long as you different destination regs for the third and fifth instructions. (With some manual guards against some registers not being equal for fuller correctness -- this is also something that IR patterns don't need. If you want to try using an IR pattern, beware that they are matched backward, so you'd need a variable for 32 - N and then use that to rewrite the other math.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants