Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explanation for subgroup vs dynamically uniform. #2118

Merged
Merged
80 changes: 80 additions & 0 deletions chapters/shaders.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2498,6 +2498,86 @@ In other shader stages, each invocation in a subgroup must: be in the same

Only <<limits-subgroup-supportedStages, shader stages that support subgroup
operations>> have defined subgroups.

[NOTE]
.Note
====
In shaders, there are two kinds of uniformity that are of primary interest to
applications: subgroup uniform and dynamically uniform.

While dynamically uniform appears to imply subgroup uniform,
HansKristian-Work marked this conversation as resolved.
Show resolved Hide resolved
it is not necessarily the case for shader stages without defined workgroups.

For shader stages with defined workgroups, this assumption holds.
HansKristian-Work marked this conversation as resolved.
Show resolved Hide resolved
If a value is dynamically uniform, it is by definition also uniform across the subgroup,
as this is a specific guarantee provided to stages with explicit workgroups.
This is important if writing code like:

[source,c]
~~~~
uniform texture2D Textures[];
uint dynamicallyUniformValue = gl_WorkGroupID.x;
vec4 value = texelFetch(Textures[dynamicallyUniformValue], coord, 0);

// subgroupUniformValue is guaranteed to be uniform across the subgroup.
// This value also happens to be dynamically uniform.
vec4 subgroupUniformValue = subgroupBroadcastFirst(dynamicallyUniformValue);
~~~~

In shader stages without defined workgroups, this gets complicated.
Due to scoping rules, there is no guarantee that a subgroup is a subset of the invocation scope,
which in turn defines the scope for dynamically uniform.
In graphics, the invocation scope is a single draw command.
HansKristian-Work marked this conversation as resolved.
Show resolved Hide resolved
In multi-draw indirect, there are multiple invocation scopes, one per code:DrawIndex.

[source,c]
~~~~
// Assume SubgroupSize = 8, where 3 draws are packed together.
// Two subgroups were generated.
uniform texture2D Textures[];

// DrawIndex builtin is dynamically uniform
uint dynamicallyUniformValue = gl_DrawID;
// | gl_DrawID = 0 | gl_DrawID = 1 | }
// Subgroup 0: { 0, 0, 0, 0, 1, 1, 1, 1 }
// | DrawID = 2 | DrawID = 1 | }
// Subgroup 1: { 2, 2, 2, 2, 1, 1, 1, 1 }

uint notActuallyDynamicallyUniformAnymore =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While everything in this PR seems technically correct, it's concerning because this part suggests that you can't use a peeling loop to remove the need for the nonuniform decoration, which I think existing content relies on. Maybe we should change VUID 06274 to say is not dynamically uniform or subgroup uniform? In the long term I'd prefer to get rid of the nonuniform decoration entirely (I think there are other, implementation-dependent cases where nonuniform is misleading/wrong), but I think that's a harder change to get through.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it realistic for an application to be able to know whether they're subgroup-uniform or not in this case? Doesn't that depend on the hardware/driver and how it might pack draws?

Copy link
Contributor Author

@HansKristian-Work HansKristian-Work Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t's concerning because this part suggests that you can't use a peeling loop to remove the need for the nonuniform decoration, which I think existing content relies on.

Peeling loops to avoid nonuniformEXT are technically out of spec, since dynamically uniform is the requirement, not subgroup uniform. Not sure why it's specced like that, but it is what it is. Desktop content in the wild definitely relies on it because it works there.

I think a compiler could optimize away nonuniformEXT if the input is subgroup uniform and the hardware only cares about that kind of uniform for descriptor access.

subgroupBroadcastFirst(dynamicallyUniformValue);
// | gl_DrawID = 0 | gl_DrawID = 1 | }
// Subgroup 0: { 0, 0, 0, 0, 0, 0, 0, 0 }
// | gl_DrawID = 2 | gl_DrawID = 1 | }
// Subgroup 1: { 2, 2, 2, 2, 2, 2, 2, 2 }

// Bug. gl_DrawID = 1's invocation scope observes both index 0 and 2.
vec4 value = texelFetch(Textures[notActuallyDynamicallyUniformAnymore],
coord, 0);
~~~~

Another problematic scenario is when a shader attempts to help the compiler notice
that a value is subgroup uniform to potentially improve performance.

[source,c]
~~~~
layout(location = 0) flat in dynamicallyUniformIndex;
// Vertex shader might have emitted a value that depends only on gl_DrawID,
// making it dynamically uniform.
// Give knowledge to compiler that the flat input is dynamically uniform,
// as this is not a guarantee otherwise.

uint uniformIndex = subgroupBroadcastFirst(dynamicallyUniformIndex);
// Hazard: If different draw commands are packed into one subgroup, the uniformIndex is wrong.

DrawData d = UBO.perDrawData[uniformIndex];
~~~~

For implementations where subgroups are packed across draws, the implementation must
make sure to handle descriptor indexing correctly. From the specification's point of view,
a dynamically uniform index does not require code:NonUniform decoration,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about this? My read of the spec says the opposite:

it is mandatory for certain operands to be decorated as NonUniform if they are not guaranteed to be dynamically uniform.

and such an implementation will likely either promote descriptor indexing into code:NonUniform on its own,
or handle non-uniformity natively.
HansKristian-Work marked this conversation as resolved.
Show resolved Hide resolved
====
endif::VK_VERSION_1_1[]


Expand Down