-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDL3 GPU WebGPU Backend #12046
base: main
Are you sure you want to change the base?
SDL3 GPU WebGPU Backend #12046
Conversation
…gh all of the commits were the one I just rebased... Fixed everything back up.
…PU objects aren't being released via the bindings. Might be an actual bug with Emscripten's bindings specifically, need more info. Working on a solution for uniform functions in SDL3. WebGPU BindGroups make this specific approach tough to handle. Assume uniform struct is stored at group 0 binding 0, contents should be 1 buffer FOR NOW.
Improved logging for shader creation
…ere is no reason for them to mimic the Vulkan implementation. Added GPU API checklist. Next will be vertex and fragment uniform buffers. Updated checklist
… crashes, but nothing renders properly. Need to investigate further.
…ad of individual enums.
…a bunch of existing bugs with the backend. Still encountering a layerCount issue that I cannot verify. My debugger says the texture and texture view both have 4 layers, but the error says that the texture's array layer count is 1.
…allows views of 1 layer for color attachments...
…ctionality offered in WebGPU.
… pipelines. Now we create internal SDL pipelines and everything is handled nicely. 3D texture example still works.
…gate why the sampler isn't working in the Blit2DArray example.
…no longer needed outside of the frame. Minimizes heap resizing
… more static allocations now. Static allocations only occur on named object creation, and when dealing with PipelineLayouts. Planning on refactoring PipelineLayouts later.
… the emscripten keyboard event handlers when no hint was set.
… configure the surface. Elie Michel's surface configuration logic was added but the macros don't seem to want to work for me. I've added a temporary workaround since I am only testing Emscripten anyways.
Congrats on the awesome progress! |
src/gpu/webgpu/SDL_gpu_webgpu.c
Outdated
while (SDL_GetAtomicInt(&buffer->mappingComplete) != 1) { | ||
if (SDL_GetTicks() - startTime > TIMEOUT) { | ||
SDL_LogError(SDL_LOG_CATEGORY_GPU, "Failed to map buffer: timeout"); | ||
return NULL; | ||
} | ||
|
||
SDL_Delay(1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This spin-wait is a huge red flag. Generally speaking browser async operations should not be implemented this way. I would be very concerned that this will break on certain targets since generally async stuff on the web is specified to not be observable until the event loop turns; if this happens to work it could break in the future and nobody would know what was going on.
At a minimum you should have a comment here that specifies why it's safe/appropriate to do this instead of doing something else (I don't know what else you'd do offhand) - i.e. 'here's the part of the WebGPU spec that says this is legal and the spin should complete quickly' or 'i tested this on and on and '.
Thankfully this appears to only apply to readback which makes it have less of an impact on the overall API; it might be that what you need to do is specify an async readback API extension to SDL_GPU and make that the only legal way to do readback on the WebGPU target.
Blocking the browser's main thread (for up to 1000ms in this case) is very bad. It causes all sorts of downstream problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll throw some comments in! I'll also have to add some preprocessor macros to ensure that SDL_Delay(1) calls are specific to Emscripten. This is done since browser backends for WebGPU don't give access to device ticking, so we have to yield back to the browser for a tiny amount of time for the backend to tick the device for us.
Here's a quote from Elie Michel:
"When our C++ code runs in a Web browser (after being compiled to WebAssembly through emscripten), there is no explicit way to tick/poll the WebGPU device. This is because the device is managed by the Web browser itself, which decides at what pace polling should happen. As a result:
The device never ticks in between two consecutive lines of our WebAssembly module, it can only tick when the execution flow leaves the module.
The device always ticks between two calls to our MainLoop() function, because if you remember the Emscripten section of the Opening a Window chapter, we leave the main loop management to the browser and only provide a callback to run at each frame.
Thanks to the second point, we do not need wgpuPollEvents to do anything when called at the beginning or end of our main loop (so we set yieldToWebBrowser to false).
However, if what we intend is really to wait until something happens (e.g., a callback gets invoked), the first point requires us to make sure we yield back the execution flow to the Web browser, so that it may tick its device from time to time. We do this thanks to emscripten_sleep function, at the cost of effectively sleeping during 100 ms (we’re in a case where we want to wait anyways).
Note that using emscripten_sleep requires the -SASYNCIFY link option to be passed to emscripten, like we added already."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specify an async readback API extension to SDL_GPU
We have an async readback API, it's the Download and QueryFence/WaitForFence functions. If the committee can't define their specification for this extremely common use case in a normal way like every single industry-standard API going back to D3D11 that is firmly their problem. I would rather force the webGPU backend to implement a hack to make it work our way than poison our API with something as stupid as an async buffer map call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so because you're relying on asyncify being set (I missed this, sorry! my bad) the sleep is not a spinwait but is instead a yield-to-browser-event-loop. That's much better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I'm not only being grumpy in this thread, here's a quick sketch of how this could possibly work:
A "fence" in the webGPU backend could just be defined as a group of resources that are waiting on async map operations. Then implementing QueryFence would be as simple as checking buffer->mappingComplete
for each of these resources. WaitForFence could be implemented with the spinwait. That might be enough for this to work.
src/gpu/webgpu/SDL_gpu_webgpu.c
Outdated
while (!renderer->device) { | ||
SDL_Delay(1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a forward progress guarantee here? Please specify what provides the guarantee of forward progress. A naive reading of this suggests that it might never stop spinning since there's no timeout. It would be nice to at least see a timeout here and have it error out when the timeout expires.
It would be even better to not have this spin-wait. It's a red flag and doesn't seem like it should be necessary if everything is working correctly, it suggests that someone - not necessarily you, it could be the browser vendor or the user mode graphics driver - got something wrong.
Worst-case this spin wait could actually prevent forward progress if something important is waiting in the event loop queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of checking the device pointer itself, I can add some bool that gets toggled by the RequestDeviceCallback.
If the status received by the callback is anything but successful, then we say that it failed which would then terminate the quoted infinite loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See: 11d8ef7
Looks like there was a bad rebase because some of the enum entries gpu.c have been randomly deleted, etc. The includes need to be cleaned up too. |
I reckon it was in here: 850caed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left comments on all the obvious stuff I noticed for now.
I'll also note here that cycling hasn't been implemented for any resources.
#ifdef __EMSCRIPTEN__ | ||
SDL_SetHint(SDL_HINT_GPU_DRIVER, "webgpu"); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't right, we shouldn't be depending on emscripten since webgpu can also have native implementations.
src/gpu/SDL_gpu.c
Outdated
bool is_webgpu = SDL_strcasecmp(backend, "webgpu") == 0; | ||
|
||
// WebGPU uses ~0u for default layer_or_depth_plane, however this causes issues with other backends | ||
if (color_target_infos[i].layer_or_depth_plane == ~0u && !is_webgpu) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be translating from SDL to WGPU, not the other way around. If the client passes in ~0u for the layer then that violates our spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link: c1d8428
src/gpu/SDL_gpu.c
Outdated
// Get hint to check for "webgpu" | ||
const char *backend = SDL_GetHint(SDL_HINT_GPU_DRIVER); | ||
bool is_webgpu = SDL_strcasecmp(backend, "webgpu") == 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to query hints to get the backend from gpu.c
src/gpu/SDL_sysgpu.h
Outdated
@@ -18,6 +18,7 @@ | |||
misrepresented as being the original software. | |||
3. This notice may not be removed or altered from any source distribution. | |||
*/ | |||
#include "../SDL_internal.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect #include
src/gpu/SDL_gpu.c
Outdated
@@ -20,6 +20,7 @@ | |||
*/ | |||
#include "SDL_internal.h" | |||
#include "SDL_sysgpu.h" | |||
#include <SDL3/SDL_gpu.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect #include
.label = "SDL_GPU Command Encoder", | ||
}; | ||
|
||
commandBuffer->commandEncoder = wgpuDeviceCreateCommandEncoder(renderer->device, &commandEncoderDesc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to pool the command buffer structures than creating a new command encoder every frame.
src/gpu/webgpu/SDL_gpu_webgpu.c
Outdated
int width, height; | ||
SDL_GetWindowSize(renderer->claimedWindows[0]->window, &width, &height); | ||
commandBuffer->currentViewport = (WebGPUViewport){ 0, 0, width, height, 0.0, 1.0 }; | ||
commandBuffer->currentScissor = (WebGPURect){ 0, 0, width, height }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this function touching windows? This should be done in BeginRenderPass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to BeginRenderPass. I'll link commit once it's up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link: 8d601ec
src/gpu/webgpu/SDL_gpu_webgpu.c
Outdated
{ | ||
// Just call Submit for WebGPU | ||
WebGPU_Submit(commandBuffer); | ||
// There are no fences in WebGPU, so we don't need to do anything here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not having any kind of fence abstraction is going to break tons of applications.
It seems like there's some kind of pseudo-fence callback structure:
https://developer.mozilla.org/en-US/docs/Web/API/GPUQueue/onSubmittedWorkDone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding stuff here as notes for myself when I return:
In the C API, the function is defined as: wgpuQueueOnSubmittedWorkDone(WGPUQueue queue, WGPUQueueWorkDoneCallback callback, void *userdata)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, then this can probably be implemented by just having a Fence struct as the userdata and then marking it as finished in the callback.
// Slightly altered, though with permission by Elie Michel: | ||
// @ https://github.com/eliemichel/sdl3webgpu/blob/main/sdl3webgpu.c | ||
// https://github.com/libsdl-org/SDL/issues/10768#issuecomment-2499532299 | ||
#if defined(SDL_PLATFORM_MACOS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be touching platform code in the implementation like this. We'll probably need some kind of platform abstraction in SDL itself that can get a WGPU surface.
|
||
bool cycleBindGroups; | ||
|
||
WebGPUUniformBuffer vertexUniformBuffers[MAX_UNIFORM_BUFFERS_PER_STAGE]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pipeline should not own these, uniform buffers should be pooled.
@@ -0,0 +1,4602 @@ | |||
// File: /webgpu/SDL_gpu_webgpu.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include the standard text from https://github.com/libsdl-org/SDL/blob/main/include/SDL3/SDL_copying.h and add any copyright attribution you'd like here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link: a385d47
src/gpu/webgpu/SDL_gpu_webgpu.c
Outdated
@@ -2090,6 +2106,11 @@ void WebGPU_BeginRenderPass(SDL_GPUCommandBuffer *commandBuffer, | |||
return; | |||
} | |||
|
|||
int width, height; | |||
SDL_GetWindowSize(wgpu_cmd_buf->renderer->claimedWindows[0]->window, &width, &height); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still not right, the viewport and scissor should be set to the smallest size of bound render targets. Please reference how the other backends implemented this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I get it now! I'll return to this after some rest I think.
I read up on the Vulkan implementation and will follow that one tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link: e24094d
It's still not 1-to-1 with the Vulkan backend but the viewport and scissor now use the smallest available size of all bound render targets.
It also now sets the other default states for the render pass.
src/gpu/webgpu/SDL_gpu_webgpu.c
Outdated
// Note: Compiling SDL GPU programs using emscripten will require -sUSE_WEBGPU=1 -sASYNCIFY=1 | ||
|
||
#include "../SDL_sysgpu.h" | ||
#include "SDL_internal.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SDL_internal.h needs to be the first include in the file. I usually throw it right after the standard blurb at the top so I don't forget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link: d2fbc02
…atch Vulkan implementation.
…ed this problem already.
…n tested but it compiles.
Description
Congrats on shipping SDL 3.20, and officially releasing SDL3!
Now that SDL3 has been released, I have decided to open a PR for my work for the WebGPU backend as suggested by @flibitijibibo.
Attached is a checklist of the API methods, as well as a checklist of working examples. (As of 2025-01-21).
Examples and more info can be found at: https://github.com/klukaszek/SDL3-WebGPU-Examples
(Based on https://github.com/TheSpydog/SDL_gpu_examples/)
A live demo can be found at: https://kylelukaszek.com/SDL3-WebGPU-Examples/.
My fork currently fails to pass the Emscripten pipeline test for some reason that I haven't taken the time to investigate yet. So that will probably have to be resolved before merging with main.
I'm probably gonna get to work on compute pipelines sometime soon if no one ends up working on that by the time I'm free again.
Shaders
This current implementation of the backend expects WGSL shaders since I have only tested on browsers, and browser implementations of WebGPU don't offer support for the SPIRV SType. Once native WGPU support becomes a priority, then this issue can be tackled.
API Checklist
General
Swapchains
Command Buffers and Fences
Note: WebGPU has no exposed fence API.
Buffers
Textures
Samplers
Debugging
Graphics Pipelines
Compute Pipelines
Shaders
Rendering
Copy Passes
Compute Passes
Fragment Stage
Vertex Stage
Rendering States
Composition
Example Checklist
Native WebGPU Support
I have not done any testing with native distributions of WebGPU (WGPU Native / Dawn), though I have implemented Elie Michel's surface selector logic sdl3webgpu.c for when someone wants to give it a test.
Warning:
The preprocessor macros in WebGPU_INTERNAL_CreateSurface() don't seem to work properly, and as a result, I hard coded in a workaround since I'm only testing on the web for the time being.
Existing Issue(s)
#10768