Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add linux-drm-syncobj-v1 protocol #411

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Add linux-drm-syncobj-v1 protocol #411

wants to merge 1 commit into from

Conversation

ids1024
Copy link
Member

@ids1024 ids1024 commented Apr 9, 2024

Based on Smithay/smithay#1356.

I think the blocker logic should be correct for handling acquire points (if I properly understand the transaction system in Smithay). Though I don't see rendering issues with Mesa git when the blocker is removed... maybe it needs to be tested with something heavier than vkcube. (Or is there something still forcing implicit sync?).

The logic I added in Smithay for signaling releases may be a little less correct. Though maybe not more incorrect that how buffer releases are currently handled? (If I understand DRM correctly, with direct scanout we should make sure not to release until we've committed a new buffer and are sure the display controller won't want to read the buffer.)

We'll be able to test more when the next Nvidia driver is released. This at least gives us a way to test the explicit sync support they're adding.

Presumably we should test if drmSyncobjEventfd is supported... maybe just creating a syncobj and calling that to see if it works? I'm also still a little unsure how this ends up working with multiple GPUs... particularly if one is Nvidia.

@Drakulix
Copy link
Member

I think the blocker logic should be correct for handling acquire points (if I properly understand the transaction system in Smithay). Though I don't see rendering issues with Mesa git when the blocker is removed... maybe it needs to be tested with something heavier than vkcube. (Or is there something still forcing implicit sync?).

For that you probably have to remove the old dmabuf.generate_blocker logic, which pulls a fence out of the dmabuf to do essentially the same thing. We basically should check, if the client uses explicit-sync and then use the acquire fence and otherwise fallback to polling the dmabuf directly.

The logic I added in Smithay for signaling releases may be a little less correct. Though maybe not more incorrect that how buffer releases are currently handled? (If I understand DRM correctly, with direct scanout we should make sure not to release until we've committed a new buffer and are sure the display controller won't want to read the buffer.)

Yeah, I am pretty sure smithay's code isn't correct in what it does today, but given direct-scanout mostly works, that is probably using implicit-sync in the background.

What needs to happen is storing the fence generated by compositing or in case of direct scanout, we need to get the OUT_FENCE_PTR property of the respective plane. Then we can wait for that fence, once the buffer is replaced and won't be used for new rendering/scanout operations and signal release once that fence is done. (Maybe there is even kernel api to "signal once this other fence is signalled"?)

Presumably we should test if drmSyncobjEventfd is supported... maybe just creating a syncobj and calling that to see if it works? I'm also still a little unsure how this ends up working with multiple GPUs... particularly if one is Nvidia.

Yeah, imo this needs a compile test similar to what we do for gbm in smithay: https://github.com/Smithay/smithay/blob/master/build.rs#L99-L125

So if the local kernel/drm headers of the system support it, we enable the feature and assume the kernel does as well. I don't know how well runtime detection would work, but we have to make sure to not advertise the global, if this function isn't supported.

@ids1024
Copy link
Member Author

ids1024 commented Apr 10, 2024

For that you probably have to remove the old dmabuf.generate_blocker logic, which pulls a fence out of the dmabuf to do essentially the same thing. We basically should check, if the client uses explicit-sync and then use the acquire fence and otherwise fallback to polling the dmabuf directly.

This should already be doing that. If there's an acquire point, it adds a DrmSyncPointBlocker and skips adding the DmabufBlocker.

Yeah, I am pretty sure smithay's code isn't correct in what it does today, but given direct-scanout mostly works, that is probably using implicit-sync in the background.

I'm not sure if implicit sync does does something to help with releases (blocking client writes to the buffer until the display controller is done with the buffer... but yeah, it does seem to mostly work. If implicit sync isn't involved, this won't be more problematic with explicit sync and the same limitation.

(Maybe there is even kernel api to "signal once this other fence is signalled"?)

It should be possible to do something with drmSyncobjTransfer and such. Probably the other implementations of the protocol do something like that, so we can look at those.

We just have to make sure to only do that once the buffer is no longer used elsewhere in the compositor.

@Drakulix
Copy link
Member

I'm not sure if implicit sync does does something to help with releases (blocking client writes to the buffer until the display controller is done with the buffer... but yeah, it does seem to mostly work. If implicit sync isn't involved, this won't be more problematic with explicit sync and the same limitation.

It definitely does, but the nvidia-driver at least isn't doing that correctly, when you send dmabufs directly to KMS without going through egl-gbm and allocating a EGLSurface. Which is why we have the needs_sync workaround in smithay. But this should work for mesa.

(Maybe there is even kernel api to "signal once this other fence is signalled"?)

It should be possible to do something with drmSyncobjTransfer and such. Probably the other implementations of the protocol do something like that, so we can look at those.

👍

We just have to make sure to only do that once the buffer is no longer used elsewhere in the compositor.

I am pretty sure, that is what our current release logic does, with the exception of handling direct-scanout. Which would be handled by the fence anywhere in this case, so that should be correct. We just need to adjust the DrmCompositor to extract the out-fence and expose that in the RenderResult.

@ids1024
Copy link
Member Author

ids1024 commented Apr 11, 2024

I am pretty sure, that is what our current release logic does, with the exception of handling direct-scanout. Which would be handled by the fence anywhere in this case, so that should be correct. We just need to adjust the DrmCompositor to extract the out-fence and expose that in the RenderResult.

I mean if we want to have OUT_FENCE_PTR directly signal the release point:

  • We can't do that if we are using the same buffer for direct scanout or rendering elsewhere (on a different monitor; screencopy)
  • We can't do that if we might use the buffer again. Which could even be the case if another buffer has been committed, since the new buffer may still be blocked when we want to render the next frame.

So I'm not sure when we could actually do that? Maybe with commit-queue-v1, where we might know the next buffer is ready, but aren't using it until the next frame.

So at least for now I think we need to stick to signaling the release point from CPU? But should still track when OUT_FENCE_PTR has signaled scanout is done with the buffer.

@Drakulix
Copy link
Member

I mean if we want to have OUT_FENCE_PTR directly signal the release point:

* We can't do that if we are using the same buffer for direct scanout or rendering elsewhere (on a different monitor; screencopy)

Right, so this rather needs to be a list of fences to wait for. Meaning we probably have to wait and signal ourselves instead of relying on drmSyncobjTransfer.

* We can't do that if we might use the buffer again. Which could even be the case if another buffer has been committed, since the new buffer may still be blocked when we want to render the next frame.

But merge of the state should only happen, once all blockers are resolved. And only then we release, so I believe that issue is already handled correctly. Nothing would be able to use that buffer for rendering any more at that point.

But we still need to track the buffer to be able to signal later, so we might as well unify the approach and handle the release-event the same. I feel like this could benefit from some infrastructure and refactoring in smithay.

So I'm not sure when we could actually do that? Maybe with commit-queue-v1, where we might know the next buffer is ready, but aren't using it until the next frame.

I think we can implement both fifo and commit-queue with blockers as well.

So at least for now I think we need to stick to signaling the release point from CPU? But should still track when OUT_FENCE_PTR has signaled scanout is done with the buffer.

Yeah, I am coming to the same conclusion, but that isn't too bad, as that is just another fd in the loop and a very small action.

@@ -450,6 +451,9 @@ pub fn init_backend(
// Create relative pointer global
RelativePointerManagerState::new::<State>(&dh);

// TODO check if main device supports syncobj eventfd?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess worst case we can still fail the import_timeline-request, right?
Is Xwayland/mesa able to handle this / fallback to implicit sync?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any way to fallback once we expose the global. The protocol doesn't make import_timeline failable, except as a protocol error.

It looks like Mutter uses drmSyncobjEventfd (drm_fd, 0, 0, -1, 0) != -1 || errno != ENOENT) to check if the call is supported. So we probably want to do the same, and only create the global if it is supported.

@ryzendew
Copy link

ryzendew commented Aug 7, 2024

Can we get this rebased on the newest master please

@gabriele2000
Copy link

Is this going to be implemented soon?
It's the only thing that forces me to use vsync in every game, something that I'd rather turn off since Wayland doesn't need it.

@ids1024
Copy link
Member Author

ids1024 commented Sep 19, 2024

Is this going to be implemented soon?

I've just marked Smithay/smithay#1356 as ready for review, so hopefully if no issues come up this should be merged soon.

I'm not sure if explicit sync will help with whatever issues you're seeing, but if you'd like to test this PR and report how it impacts behavior (for a certain game / graphics card / driver), that could be helpful.

@gabriele2000
Copy link

gabriele2000 commented Sep 19, 2024

I'm not sure if explicit sync will help with whatever issues you're seeing, but if you'd like to test this PR and report how it impacts behavior (for a certain game / graphics card / driver), that could be helpful.

I think it will.
The issue that I'm seeing is that with vsync disabled anything less than 60FPS feels like 15FPS, maybe 20FPS, it's definitely the sync issue that I began to encounter half a year ago.

I will report back.

@ids1024 There's definitely something weird with my setup, the issue is still here, no weird tearing now, but FPS are halved if they're less than 60 (they're even more than halved).

  • iGPU Intel 630
  • dGPU Nvidia GTX 1050TI

I'm not telling anything to the game, the just uses the nvidia gpu, I do not force any GPU.
Every game is affected by this, turning on Vsync will fix it immediately.

Mesa is at version Mesa 24.0.3-1pop1~1711635559~22.04~7a9f319
Nvidia driver is at version 560.35.03-1pop0~1726601312~22.04~92f4f94

EDIT: for more details refer to this pop-os/cosmic-epoch#184 (comment)

@ids1024
Copy link
Member Author

ids1024 commented Sep 19, 2024

FPS being lower than expected (often around 30 on a 60 fps monitor) with vsync is something I've noticed with Intel-rendered windows on a 1650 mobile (later NVIDIA cards don't seem to have this issue). In my testing Gnome Wayland seemed similar, so I assume it's something in the driver.

#211 has some previous testing I've done with that.

@gabriele2000
Copy link

gabriele2000 commented Sep 19, 2024

FPS being lower than expected (often around 30 on a 60 fps monitor) with vsync is something I've noticed with Intel-rendered windows on a 1650 mobile (later NVIDIA cards don't seem to have this issue). In my testing Gnome Wayland seemed similar, so I assume it's something in the driver.

#211 has some previous testing I've done with that.

In my case the inverse is true.
Internal framerate is good with vsync disabled and enabled, but with vsync disabled what I see isn't what the computer sees.

Personally I see extreme lag, the game renders fine though because if you record a gameplay and watch it, you'll see it buttery-smooth

UPDATE: Basically the chronological order is the whole thing is:

  • Frames would jump back with vsync disabled
  • One day the issue was fixed thanks to PRIME offloading
  • cosmic-comp started to show this "high internal FPS but what you see is a stuttery mess" with vsync off (still no noticeable diagonal tearing)
  • I try this fix that I've been waiting for months and don't fix anything
  • Disappointment

@Tipcat-98
Copy link

Tried this briefly with NVIDIA 560.35.03.
Firefox would frequently crash with:
"MozCrashReason":"Error flushing display: Broken pipe"

Other things seemed to work fine though.

Using Pop!_os 22.04 with popdev:master branch.

@gabriele2000
Copy link

Tried this briefly with NVIDIA 560.35.03. Firefox would frequently crash with: "MozCrashReason":"Error flushing display: Broken pipe"

Other things seemed to work fine though.

Using Pop!_os 22.04 with popdev:master branch.

Can you read my comment and please tell me if you don't have the issue?
Are you on a hybrid setup?

@Tipcat-98
Copy link

I'm using a desktop, with my intel integrated gpu disabled in BIOS.
I couldn't really see any frame rate oddities with or without v-sync.

@ids1024
Copy link
Member Author

ids1024 commented Sep 20, 2024

Hm, are there still issues with Firefox? nvidia/egl-wayland 1.1.15 was supposed to fix some issues like that. Not sure if Firefox also needed fixes.

@Tipcat-98
Copy link

Appears so.
Can confirm that I'm on egl-wayland 1.1.16

I could send the firefox crash log if you think it could help.

@gabriele2000
Copy link

I'm using a desktop, with my intel integrated gpu disabled in BIOS. I couldn't really see any frame rate oddities with or without v-sync.

So my theory could still stand: it must be a problem with hybrid setups

@ids1024
Copy link
Member Author

ids1024 commented Sep 20, 2024

Yeah, I think the issue you're seeing is related in some way to copies between GPUs. Not sure why later Nvidia cards seem to have much less trouble with that.

If you start a program with an env var like WAYLAND_DISPLAY=wayland-1-renderD128 or WAYLAND_DISPLAY=wayland-1-renderD129 (depending on which is the Nvidia GPU) you can make sure the program is started on that GPU, and Cosmic knows which GPU it's running on. If you set it to render with the Nvidia GPU that way, performance should be good on Nvidia outputs.

If it's easy for you to test Gnome Wayland on the same setup, it would be good to know if it has similar issues. (If Gnome is doing better on any systems, we should try to figure out what they're doing differently.)

@gabriele2000
Copy link

Yeah, I think the issue you're seeing is related in some way to copies between GPUs. Not sure why later Nvidia cards seem to have much less trouble with that.

Figures

If you start a program with an env var like WAYLAND_DISPLAY=wayland-1-renderD128 or WAYLAND_DISPLAY=wayland-1-renderD129 (depending on which is the Nvidia GPU) you can make sure the program is started on that GPU, and Cosmic knows which GPU it's running on. If you set it to render with the Nvidia GPU that way, performance should be good on Nvidia outputs.

Both commands results in the issue still existing, so far I've tried to launch games in steam with that variable.
I'll try to launch steam directly with those commands, then launch a random game

@ids1024
Copy link
Member Author

ids1024 commented Sep 20, 2024

Oh yeah, that command won't change the behavior of XWayland clients, which would probably include most Steam games.

We may need to make some changes to make sure unnecessary buffer copies are avoided for XWayland clients with multiple GPUs. It's all a bit awkward currently since there isn't a reliable way for the compositor to actually know which GPU the buffers it gets from the client are allocated on. (https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/268 aims to address that; though I guess X would also need a protocol change.)

@gabriele2000
Copy link

gabriele2000 commented Sep 20, 2024

Oh yeah, that command won't change the behavior of XWayland clients, which would probably include most Steam games.

Wasn't this protocol required to basically fix gaming on nvidia even from hybrid setups?
If I understood what you said correctly, wine/proton needs wayland support or we need Yet Another Protocol to fix XWayland, right?

@Drakulix
Copy link
Member

Wasn't this protocol required to basically fix gaming on nvidia even from hybrid setups?

Explicit sync was required to fix nvidia mostly on desktops, not really on hybrid setups.

If I understood what you said correctly, wine/proton needs wayland support or we need Yet Another Protocol to fix XWayland, right?

No, we don't need a new protocol to fix Xwayland, while the protocol @ids1024 mentioned would certainly be helpful, it doesn't in it of itself fix the core issue.

The issue is, that copying from one GPU to another takes time with 1000-series cards and lower. There is nothing to fix this (apart from maybe inside the nvidia-driver, but we don't know that and it doesn't appear to be particularly badly optimized for this use case).

The only thing we can fix, is rendering games/apps on the nvidia-gpu and displaying them on a display directly connected to the nvidia gpu (some of your external displays might be connected to it directly). That will require multiple Xwayland instances, but can theoretically work and is on our TODO-list.

But there is nothing fixing the delay, if you want to game on e.g. your internal display, which is likely hard-wired to your integrated gpu (if you don't have a switch in your BIOS). The internal FPS will always be fine in that case, as your nvidia gpu can keep up just fine with the application, but you will likely see stutter or at least not reach those same frame counts because of the copy operation, that needs to happen to actually show it on your display.

@gabriele2000
Copy link

Explicit sync was required to fix nvidia mostly on desktops, not really on hybrid setups.

That explains a lot.
Still, why this problem occurs only with VSync off?
Keep in mind that, at first, before explicit sync, nvidia added a hybrid fix for the "jumpback", so that you could actually game on wayland, with vsync off, with no problems.

One day something changed and this "slow" synchronization problem happened.

@ids1024
Copy link
Member Author

ids1024 commented Sep 24, 2024

Hm, I can reproduce the Firefox crash with Nvidia, but it doesn't seem to be a protocol error.

Edit: seems to be a segfault somewhere in libxul.so...

@ids1024
Copy link
Member Author

ids1024 commented Sep 25, 2024

Okay, looks like this one was my fault, not Nvidia's.

Smithay/smithay#1547 appears to fix the Firefox crash.

@ids1024 ids1024 changed the title WIP linux-drm-syncobj-v1 Add linux-drm-syncobj-v1 protocol Sep 25, 2024
@ids1024 ids1024 marked this pull request as ready for review September 25, 2024 16:53
@ids1024
Copy link
Member Author

ids1024 commented Sep 25, 2024

I think everything should be good now. So this can be merged if no more issues are occurring.

@ptr1337
Copy link

ptr1337 commented Sep 26, 2024

I think everything should be good now. So this can be merged if no more issues are occurring.

Ive tested this MR on my 4070 Super with 560 Drivers. I was not able to open discord, nor cosmic settings or equal.
I will check tomorrow to gather some logs.

@ids1024
Copy link
Member Author

ids1024 commented Sep 27, 2024

Hm. cosmic-settings should be using Vulkan (via wgpu), and Nvida's Vulkan doesn't even use explicit sync yet in the 560 driver (while their EGL implementation does). Unless it's a multi-gpu system and is running on the integrated GPU.

Hopefully the logs provide more context.

@skygrango
Copy link
Contributor

skygrango commented Sep 27, 2024

Nvida's Vulkan doesn't even use explicit sync yet in the 560 driver (while their EGL implementation does).

sorry, my env has WGPU_BACKEND=vulkan, so my test should not be considered a valid reference


I'm trying this.

spec : gtx 1080
driver : 560.35.03
cosmic version: alpha 2

quick check the following apps

work list:
chromium
firefox
evolution
cosmic-term
cosmic-files
cosmic-edit
filezilla
kate
vlc

not work list:
cosmic-settings on vulkan.
discord
vs-code
steam
gimp

It can be started, but the menu cannot be displayed:
konsole
gnome-terminal

cosmic-settings vulkan log : log
cosmic-settings opengl log : log
discord log : log

maybe I also need to test latest cosmic-comp

Hopefully the logs provide more context.

what I can do for you

cosmic-comp keep printing in journal ...

cosmic-comp[1718]: [GL] Buffer detailed info: Buffer object 3 (bound to GL_VERTEX_ATTRIB_ARRAY_BUFFER_BINDING_ARB (0), usage hint is GL_STATIC_DRAW) will use VIDEO memory as the source for buffer object operations.

If my gpu is not supported, please let me know

@skygrango
Copy link
Contributor

skygrango commented Sep 27, 2024

I test this pr on my rx7900xtx

driver : mesa 24.3.0_devel.194431.6f3c003433f-1
monitor: 4k 240hz , render x11 application at native resolution

I tried to play DBD and encountered the following situation, I don't see this regression in cosmic-comp 1.0.0.alpha.2

And it only happens in full screen mode, windowed mode has no problem

IMG_7135

@ids1024
Copy link
Member Author

ids1024 commented Sep 27, 2024

If my gpu is not supported, please let me know

We expect COSMIC to work on any GPU supported by the current NVIDIA drivers. (Though if there's a bug in their drivers on certain cards, we may not be able to do anything about it.)

And it only happens in full screen mode, windowed mode has no problem

Hm, probably something involving direct scanout. We should be waiting properly for the explicit sync point before the buffer is used...

@skygrango
Copy link
Contributor

We expect COSMIC to work on any GPU supported by the current NVIDIA drivers. (Though if there's a bug in their drivers on certain cards, we may not be able to do anything about it.)

got it :)

Hm, probably something involving direct scanout. We should be waiting properly for the explicit sync point before the buffer is used...

very much like what you surmised, this damage is not visible in the capture screenshots.

@Tipcat-98
Copy link

Tipcat-98 commented Oct 3, 2024

Since the latest update, cosmic settings seems to be working on nvidia.

@ids1024
Copy link
Member Author

ids1024 commented Oct 3, 2024

Yep, Smithay/smithay#1554 should fix that issue on Nvidia, at least.

Not sure about the issue on the rx7900xtx, which does look like a real synchronization issue. I assume that's using vkd3d on wine on xwayland for rendering? The only RDNA hardware I have is a Steam Deck, and I'm not seeing that issue running a couple games on it under cosmic-comp.

@skygrango
Copy link
Contributor

Yep, Smithay/smithay#1554 should fix that issue on Nvidia, at least.

Not sure about the issue on the rx7900xtx, which does look like a real synchronization issue. I assume that's using vkd3d on wine on xwayland for rendering? The only RDNA hardware I have is a Steam Deck, and I'm not seeing that issue running a couple games on it under cosmic-comp.

yep, the game I run with dx12, and it support dx11 too.

You are welcome to ping me at any time with your list of testing needs and I will be available to assist with testing after get off work today :)

my work computer is equipped with gtx 1080, I will test it again when I have time

@skygrango
Copy link
Contributor

skygrango commented Oct 4, 2024

GTX1080 (Proprietary driver)

after updating the branch and turning off WGPU_BACKEND=vulkan
all my commonly used apps can now be used.

but there are still problems with the menus of konsole and gnome terminal

konsole : menu can be opened, but the submenu cannot be displayed
gnome terminal : menu cannot be opened

There is also a problem with cosmic editor I am not sure if it is related to cosmic-comp

the submenu opens to the left and cannot be displayed correctly (this PR does not affect this?)
screenshot-2024-10-04-02-09-50

update:

I back to kde wayland, all problems disappeared

gnome terminal
螢幕截圖_20241004_102105
konsole
螢幕截圖_20241004_102028
cosmic editor
螢幕截圖_20241004_102411

@gabriele2000
Copy link

Apparently my weird issue translates to VR too since I see "backwards" frames or something.
Yes, I can play in VR using cosmic, amazing!

@skygrango
Copy link
Contributor

I'm still having the same sync issues playing dx12 and dx11 games on the latest update on my 7900xtx, and some temporary freezes during daily use.

I might have to try to collect more logs, but I'm not sure what information in the journal would be useful.

@StayBlue
Copy link

I'm using this with the latest Nvidia driver (560.35.03), which has worked great for me. If any other NixOS users would like to try this, you may try applying the overlay I made for it, which is available here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants