Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapped render target memory can crash replaying detected GPU writes #1863

Closed
kvark opened this issue May 2, 2020 · 8 comments
Closed

Mapped render target memory can crash replaying detected GPU writes #1863

kvark opened this issue May 2, 2020 · 8 comments
Labels
Bug A crash, misbehaviour, or other problem Unresolved Waiting for a fix or implementation

Comments

@kvark
Copy link
Contributor

kvark commented May 2, 2020

Description

When loading a specific capture file, RenderDoc, at first it seems normal. I can switch between tabs, and I see the progress bar at the bottom constantly showing movement. But when I click on one of the events on the left, everything hangs.

Steps to reproduce

The diagnostic log complains about nonCoherentAtomSize not being respected, but I'm not seeing in the app code where it would be the case. Not sure if it's related.

Environment

  • RenderDoc version: 1.7 rev a56af58
  • Operating System: Linux
  • Graphics API: Vulkan

The log lists 2 GPUs, not sure which one RenderDoc is using:

Core PID 188431: [23:07:24] vk_device_funcs.cpp(1356) - Log - physical device 0: Intel(R) UHD Graphics 630 (Coffeelake 3x8 GT2) (ver 19.3 patch 0x4) - 8086:3e9b
Core PID 188431: [23:07:24] vk_device_funcs.cpp(1356) - Log - physical device 1: GeForce GTX 1050 Ti with Max-Q Design (ver 440.64 patch 0x0) - 10de:1c8c

@kvark
Copy link
Contributor Author

kvark commented May 2, 2020

Just double-checked that running the same app on either of the GPUs I have is totally validation-free. So I wonder if these flush errors come from RenderDoc itself?

@baldurk baldurk added Bug A crash, misbehaviour, or other problem Unresolved Waiting for a fix or implementation labels May 2, 2020
@baldurk
Copy link
Owner

baldurk commented May 4, 2020

I'm not able to reproduce this unfortunately. On the closest hardware I could get running the same mesa version I'm able to open the capture and see things working as I'd expect.

One thing I noticed which seems quite strange and might have something to do with the problem is that memory which is allocated and only bound to optimal tiled images (the color/depth attachments) is mapped and has writes to it in the frame. Since these images aren't in a host-accessible format those writes seem like they're invalid and might cause problems. FWIW it looks like the errors you see are due to RenderDoc mapping coherent map writes into flushes for the sake of serialisation.

Are you able to reproduce the problem on a capture without those optimal tiled images having their memory mapped?

@baldurk baldurk added Need More Info More information is needed from a user to work on this issue and removed Unresolved Waiting for a fix or implementation labels May 4, 2020
@kvark
Copy link
Contributor Author

kvark commented May 4, 2020

Huh, we by no means are trying to map that memory (used for render attachments). It would make little sense:) We only map (some of the) buffer memory.
Are you sure that is something our application does, and not RenderDoc itself? I'll definitely double check on our end.

@baldurk
Copy link
Owner

baldurk commented May 4, 2020

Yes indeed it stood out to me for that reason :). I can't check without being able to reproduce the capture side, but this is recorded as an application map write and I don't see any way it could be caused by RenderDoc.

@kvark
Copy link
Contributor Author

kvark commented May 4, 2020

I double checked and didn't find us mapping textures. What we do, however, is mapping any memory that is CPU-visible on allocation. This never applies to textures in practice, because we prefer non-cpu-visible memory types for them.

It must have something to do with the way RenderDoc exposes different memory types to us. Could it be the case that all the memory types you are exposing are CPU-visible? We'd then be mapping it, but not doing anything with that memory on CPU side.

If you are seeing any writes to that memory, that's especially strange. Would it be possible if the memory types you are exposing are just happen to support both buffers and textures? We'd then end up sub-allocating all of them from the same memory type, and you'd see writes to this memory that are caused by our buffer writes, unrelated to the textures.

FWIW it looks like the errors you see are due to RenderDoc mapping coherent map writes into flushes for the sake of serialisation.

Is that a known issue on your side? Would be good to get fixed in order to avoid confusing the users about these validation errors.

@baldurk
Copy link
Owner

baldurk commented May 4, 2020

RenderDoc doesn't have anything to do with the memory types that are exposed, that comes from the driver itself. This capture is using the intel GPU which only exposes two memory types - both are CPU visible.

It's possible RenderDoc is detecting writes to the memory because the GPU is modifying that memory. It's not possible for me to know where such a write came from, so it will be recorded either way and then replayed as a CPU write.

I'll need to think about how/if this can be solved, it likely won't be easy to fix. I'd recommend in the meantime only mapping memory you intend to modify from the CPU so that memory being modified by the GPU isn't visible to the CPU while RenderDoc is capturing it.

@baldurk baldurk changed the title Loading a capture hangs the app Mapped render target memory can crash replaying detected GPU writes May 4, 2020
@baldurk baldurk added Unresolved Waiting for a fix or implementation and removed Need More Info More information is needed from a user to work on this issue labels May 4, 2020
@kvark
Copy link
Contributor Author

kvark commented May 4, 2020

the GPU is modifying that memory

Yes, GPU is modifying the render attachment memory.

I'd recommend in the meantime only mapping memory you intend to modify from the CPU so that memory being modified by the GPU isn't visible to the CPU while RenderDoc is capturing it.

Thank you for the suggestion, filed gfx-rs/gfx-extras#12
Looking forward to a solution you can find, if any, on your side.

@baldurk
Copy link
Owner

baldurk commented Aug 12, 2020

That commit will skip flushed memory writes to memory that only has tiled images bound to it. It will not work if there's aliasing with linear images or buffers, and I haven't been able to test if this really fixes the problem since I wasn't able to reproduce it.

It also comes with a performance penalty if it has to process in this way rather than memcpy'ing directly so I'd still strongly recommend not mapping memory regions you're using for GPU-only images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug A crash, misbehaviour, or other problem Unresolved Waiting for a fix or implementation
Projects
None yet
Development

No branches or pull requests

2 participants