Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[[clang::optnone]] causes miscompiles #2156

Open
TellowKrinkle opened this issue May 26, 2023 · 2 comments
Open

[[clang::optnone]] causes miscompiles #2156

TellowKrinkle opened this issue May 26, 2023 · 2 comments
Labels
question Further progress depends on answer from issue creator.

Comments

@TellowKrinkle
Copy link

TellowKrinkle commented May 26, 2023

This is mostly an Apple compiler bug, but we might want to consider avoiding using things that both aren't in the MSL spec and change lots of otherwise unused (and therefore probably completely untested) options in the compiler

To reproduce:

The test will do a two-pass render, first with a depth-only pass and then with a compare zequal color pass that writes red to the output. It will then compare the two images and count the number of pixels that wrote depth but failed the zequal test. This should be zero, but on Apple GPUs, 2211 pixels will fail. You can go into main.swift and uncomment the commented lines to take a capture if you're curious (change the destination and add an output if you don't want to import this project into Xcode).

The shaders VS0 and VS1 are lightly modified from spirv-cross, with as much [[clang::optnone]] usage as possible removed while still preserving the bug.

I disassembled the compiled shaders to see why the bug happens, for anyone curious:

Shader disassembly and analysis

For anyone following along, apply this patch to the apple gpu disassembler and run python3 compiler_explorer.py OptNoneFail/VS0.metal --no-fast-math

Full shader decompilation

GitHub seems to have a comment length limit of 65536 characters, so have a text file instead

The two outlined functions are as follows:

; @ 0x-80, spvFAdd<float2>
   0: 2aad56a22500         fadd32           r11, r11, r13
   6: 2ab158c22500         fadd32           r12, r12, r14
   c: 1402                 ret              r1
; @ 0x-40, spvFAdd<float3>
   0: 2aad56c22500         fadd32           r11, r11, r14
   6: 2ab158e22500         fadd32           r12, r12, r15
   c: 2ab55a022600         fadd32           r13, r13, r16
  12: 1402                 ret              r1

Looking at the area around the two function calls (since we know that's where things are breaking), you can see this:

1706: 421000000000         push_exec        r0l, 2
170c: 10c074e8ffff         call             0x-80
1712: d21600000000         pop_exec         r0l.cache, 2
1718: 02be8e2229a401308015 fcmpsel          lt, r47h, r39.cache, r41.cache, u82l, 0
1722: 02bc8e2229a6513aa015 fcmpsel          lt, r47l, r39.cache, r41.cache, u83l, u82h
172c: 02908e2229a5613aa005 fcmpsel          lt, r4l, r39.cache, r41.cache, u82h, u83l
1736: 82944e222500413a2005 fcmpsel          lt, r5l.cache, r39, r41, 0, u82l
1740: e22d00000000         mov_imm          r11.cache, 0
1746: aa9dca000002         fadd32           r7.cache, r5l.discard, -0.0
174c: 621500000000         mov_imm          r5, 0
1752: 7e19d80a8000         mov              r6, r12.discard
1758: 62350000803f         mov_imm          r13, 1065353216
175e: 7e39920a8000         mov              r14, r9.cache
1764: 7e3d920a8000         mov              r15, r9.cache
176a: 7e41d20a8000         mov              r16, r9.discard
1770: 7e31ce0a8000         mov              r12, r7.discard
1776: 421000000000         push_exec        r0l, 2
177c: 10c044e8ffff         call             0x-40
1782: 521600000000         pop_exec         r0l, 2

The calling convention seems to put inputs and outputs starting at r11
Weirdly, offset 1740 seems to overwrite r11 (the first output of the first function call) without saving it first. Checking the shader, it is used:

r8.xy = spvONFAdd(r1.xy, -r8.xy);
// much later...
r0.z = dot(r8.xy, r8.xy);

r12 does get saved into r6, so if we search for the next use of that...

1c7a: 9a9d4aa22400         fmul32           r7.cache, r5, r5
1c80: 9aad4cc22400         fmul32           r11.cache, r6, r6
1c86: 2a99dce22d10         fadd32           r38, r14.discard, r15.discard
1c8c: aaadce622d00         fadd32           r11.cache, r7.discard, r11.discard

It seems the compiler thinks r8.x is in r5, which got zeroed in offset 174c
I guess something about the next function's input being zero confused it?

@HansKristian-Work
Copy link
Contributor

HansKristian-Work commented Jun 5, 2023

What is actually actionable here from SPIRV-Cross' end? This sounds like a bug Apple needs to fix. As far as I know, [[clang::noopt]] is the only way to implement NoContract and CTS will break in half if that is not used.

@HansKristian-Work HansKristian-Work added the question Further progress depends on answer from issue creator. label Jun 5, 2023
@HansKristian-Work
Copy link
Contributor

This is probably more of an @cdavis5e or @billhollings issue if anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further progress depends on answer from issue creator.
Projects
None yet
Development

No branches or pull requests

2 participants