Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FXC] saturate() on vectors is broken in FXC debug mode, and mojoshader doesn't work around it #10

Open
kg opened this issue May 28, 2019 · 12 comments
Labels
invalid This doesn't seem right

Comments

@kg
Copy link
Contributor

kg commented May 28, 2019

This might be an upstream bug.

    // FIXME: MojoShader workaround
    if (0) {
        value = saturate(value);
    } else {
        value = float3(saturate(value.r), saturate(value.g), saturate(value.b));
    }
@flibitijibibo
Copy link
Collaborator

I think MOD_SATURATE is the name you want to look up in the source. It’s some weird miscellaneous value modifier and it’s most likely only covering certain types of values.

@kg
Copy link
Contributor Author

kg commented May 29, 2019

The MOD_SATURATE implementation is correct, it seems like clamp() isn't being generated by the shader at all so I'm going to try and figure out why.

@kg
Copy link
Contributor Author

kg commented May 29, 2019

Problem with generated code became evident once I hand-inlined constants:

    ps_r1 = texture2D(ps_s0, ps_r1.xy);
    ps_r2 = ps_r1 + vec4(0);
    ps_r1.x = ((ps_r2.x >= 0.0) ? ps_r1.x : 0);
    ps_r1.y = ((ps_r2.y >= 0.0) ? ps_r1.y : 0);
    ps_r1.z = ((ps_r2.z >= 0.0) ? ps_r1.z : 0);
    ps_r1.w = ((ps_r2.w >= 0.0) ? ps_r1.w : 0);
    ps_r2 = ps_r1 + vec4(-1.0);
    ps_r1.x = ((ps_r2.z >= 0.0) ? 1.0 : ps_r1.z);
    ps_r1.y = ((ps_r2.x >= 0.0) ? 1.0 : ps_r1.x);
    ps_r1.z = ((ps_r2.y >= 0.0) ? 1.0 : ps_r1.y);
    ps_r1.w = ((ps_r2.w >= 0.0) ? 1.0 : ps_r1.w);

This problem could be hidden in the d3d9 bytecode in a way that doesn't show up through the D3D shader compiler, I'll have to see.

@kg
Copy link
Contributor Author

kg commented May 29, 2019

That weird zxyw is in the D3D9 bytecode, so maybe this is a fxc bug? I don't know how it works through the D3D api though.

    texld r1, r1, s0
    add r2, r1, -0.0
    cmp r1, r2, r1, 0.0
    add r2, r1, -1.0
    cmp r1, r2.zxyw, 1.0, r1.zxyw
    mov r1.xyz, r1
    mov r0.w, c4.x
    mov r2.x, c3.x
    mov r1.xyz, r1
    add r2.y, r2.x, -1.0
    mov r1.x, r1.x
    mul r2.z, r2.y, r1.x
    frc r2.w, r2.z
    mov r2.w, -r2.w
    add r3.x, r2.z, r2.w
    mov r3.y, -r2.z
    mov r3.z, -r3.y
    add r3.z, r2.z, r3.z
    cmp r3.z, r3.z, 0.0, 1.0
    add r2.w, r2.w, r2.w
    cmp r2.w, r2.w, 0.0, 1.0
    mul r2.w, r2.w, r3.z
    add r2.w, r2.w, r3.x

@flibitijibibo
Copy link
Collaborator

Basic test case seems fine with FXC 9.29.952.3111.

Source:

void PixelPoot(inout float4 color : COLOR0)
{
        color = saturate(color);
}

technique T
{
        pass P
        {
                PixelShader = compile ps_3_0 PixelPoot();
        }
}

ASM:

technique T
{
    pass P
    {
        //No embedded vertex shader
        pixelshader = 
            asm {
            //
            // Generated by Microsoft (R) HLSL Shader Compiler 9.29.952.3111
                ps_3_0
                dcl_color v0
                mov_sat oC0, v0
            
            // approximately 1 instruction slot used
            };
    }
}

glsl120:

EFFECT: poot.fxb
    PROFILE: glsl120


    TECHNIQUE #0 ('T'):
        PASS #0 ('P'):
            STATE 147:
                VALUE: (null) -> (null)
                    CLASS: OBJECT
                    TYPE: PIXELSHADER
                    ROWS/COLUMNS/ELEMENTS: 0, 0, 0
                    TOTAL VALUES: 1
                    PIXELSHADER VALUES:

    OBJECT #1: SHADER, technique 0, pass 0
        PROFILE: glsl
        SHADER TYPE: pixel
        VERSION: 3.0
        INSTRUCTION COUNT: 1
        MAIN FUNCTION: ShaderFunction1
        INPUTS: (none.)
        OUTPUTS:
            * (null) ("ps_oC0")
        CONSTANTS: (none.)
        UNIFORMS: (none.)
        SAMPLERS: (none.)
        SYMBOLS: (none.)
        OUTPUT:
            #version 120
            #define ps_v0 gl_Color
            #define ps_oC0 gl_FragColor
            
            void main()
            {
            	ps_oC0 = clamp(ps_v0, vec4(0.0), vec4(1.0));
            }

@kg
Copy link
Contributor Author

kg commented May 29, 2019

Yeah it only happens at some point in complexity for the shaders. It does show up in the fxc output at that point, which is nice - mojoshader isn't constructing it from thin air. It might be something specific to that swizzle, because it looks like it was added in a later pixel shader profile than the rest of the swizzles. I haven't figured out what causes it to get generated yet.

@kg
Copy link
Contributor Author

kg commented May 29, 2019

Forgot to mention, this is very sensitive to optimization level like most of the other bugs I'm finding. When I messed around with fxc, the codegen for this stuff was dramatically different between /Od and /O0 before you even got to the stuff the later levels of optimization do. This problem only occurs in /Od.

@flibitijibibo
Copy link
Collaborator

Same test with Od:

ASM:

technique T
{
    pass P
    {
        //No embedded vertex shader
        pixelshader = 
            asm {
            //
            // Generated by Microsoft (R) HLSL Shader Compiler 9.29.952.3111
                ps_3_0
                def c0, -0, 0, -1, 1
                dcl_color v0
                add r0, c0.x, v0
                cmp r0, r0, v0, c0.y
                add r1, r0, c0.z
                cmp oC0, r1, c0.w, r0
            
            // approximately 4 instruction slots used
            };
    }
}

glsl120:

EFFECT: poot.fxb
    PROFILE: glsl120


    TECHNIQUE #0 ('T'):
        PASS #0 ('P'):
            STATE 147:
                VALUE: (null) -> (null)
                    CLASS: OBJECT
                    TYPE: PIXELSHADER
                    ROWS/COLUMNS/ELEMENTS: 0, 0, 0
                    TOTAL VALUES: 1
                    PIXELSHADER VALUES:

    OBJECT #1: SHADER, technique 0, pass 0
        PROFILE: glsl
        SHADER TYPE: pixel
        VERSION: 3.0
        INSTRUCTION COUNT: 4
        MAIN FUNCTION: ShaderFunction1
        INPUTS: (none.)
        OUTPUTS:
            * (null) ("ps_oC0")
        CONSTANTS:
            * 0: float (-0.000000 0.000000 -1.000000 1.000000)
        UNIFORMS: (none.)
        SAMPLERS: (none.)
        SYMBOLS: (none.)
        OUTPUT:
            #version 120
            const vec4 ps_c0 = vec4(0.0, 0.0, -1.0, 1.0);
            vec4 ps_r0;
            vec4 ps_r1;
            #define ps_v0 gl_Color
            #define ps_oC0 gl_FragColor
            
            void main()
            {
            	ps_r0 = ps_c0.xxxx + ps_v0;
            	ps_r0.x = ((ps_r0.x >= 0.0) ? ps_v0.x : ps_c0.y);
            	ps_r0.y = ((ps_r0.y >= 0.0) ? ps_v0.y : ps_c0.y);
            	ps_r0.z = ((ps_r0.z >= 0.0) ? ps_v0.z : ps_c0.y);
            	ps_r0.w = ((ps_r0.w >= 0.0) ? ps_v0.w : ps_c0.y);
            	ps_r1 = ps_r0 + ps_c0.zzzz;
            	ps_oC0.x = ((ps_r1.x >= 0.0) ? ps_c0.w : ps_r0.x);
            	ps_oC0.y = ((ps_r1.y >= 0.0) ? ps_c0.w : ps_r0.y);
            	ps_oC0.z = ((ps_r1.z >= 0.0) ? ps_c0.w : ps_r0.z);
            	ps_oC0.w = ((ps_r1.w >= 0.0) ? ps_c0.w : ps_r0.w);
            }

As... awful as this is, it does appear to be accurate.

@kg
Copy link
Contributor Author

kg commented May 29, 2019

Does this help?

\Documents\Projects\Fracture\ext\fxc\fxc.exe /T ps_3_0 \Documents\Projects\FNA\clamptest.fx.txt
 /Od /Fo clamptest.fx.bin && testparse.exe glsl120 clamptest.fx.bin > output.glsl.txt

clamptest.fx.txt
output.glsl.txt

@kg
Copy link
Contributor Author

kg commented May 29, 2019

Minor corrections (bad at reducing test cases here):

    // bad
    value.rgb = saturate(value.rgb);

And then the suspect output from /Od:

        	ps_r3.x = ((ps_r3.x >= 0.0) ? ps_r1.x : ps_c3.z);
        	ps_r3.y = ((ps_r3.y >= 0.0) ? ps_r1.y : ps_c3.z);
        	ps_r3.z = ((ps_r3.z >= 0.0) ? ps_r1.z : ps_c3.z);
        	ps_r4.xyz = ps_r3.xyz + ps_c3.xxx;
        	ps_r3.x = ((ps_r4.z >= 0.0) ? ps_c3.w : ps_r3.z);
        	ps_r3.y = ((ps_r4.x >= 0.0) ? ps_c3.w : ps_r3.x);
        	ps_r3.z = ((ps_r4.y >= 0.0) ? ps_c3.w : ps_r3.y);

@kg
Copy link
Contributor Author

kg commented May 29, 2019

And then for reference, here's that zxyw I noticed before, which was added in a later version of PS (ps_2_0, I think), generated in /Od:

    cmp r3.xyz, r4.zxyw, c3.w, r3.zxyw

Here's the /O3 output in comparison:

    mov_sat r5.xyz, r5
        	ps_r5.xyz = clamp(ps_r5.xyz, vec3(0.0), vec3(1.0));

@flibitijibibo
Copy link
Collaborator

Still looks accurate to me, the only odd thing I'm seeing is this from fxc:

cmp r1.xy, r1.zwzw, v3.zwzw, r1

Weird that it would even bother with zwzw but I guess it's because they didn't want to do r1.xy.

@kg kg changed the title saturate() on vectors is broken saturate() on vectors is broken in FXC debug mode, and mojoshader doesn't work around it May 30, 2019
@flibitijibibo flibitijibibo changed the title saturate() on vectors is broken in FXC debug mode, and mojoshader doesn't work around it [FXC] saturate() on vectors is broken in FXC debug mode, and mojoshader doesn't work around it Jul 13, 2021
@flibitijibibo flibitijibibo added the invalid This doesn't seem right label Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants