Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use legacy texture creation for mac and linux #1009

Merged
merged 7 commits into from
Dec 9, 2022
Merged

Conversation

nithinp7
Copy link
Contributor

@nithinp7 nithinp7 commented Dec 5, 2022

Hopefully this is a workaround for #1007, although it would be slightly better to get the current texture creation path to work for mac and linux. Both the Vulkan and Metal RHI implementations in Unreal do not have async texture creation implemented, so the current texture creation path on these platforms does the next best thing by avoiding a texture memcpy on the game thread (by wrapping the source texture memory into an Unreal-specific interface for direct GPU upload). But it won't be a huge loss to go back to the legacy texture creation code paths on these platforms.

TODO:

  • Do we need to hardcode any other platforms to use the legacy code path, Android maybe?

@cesium-concierge
Copy link

Thanks for the pull request @nithinp7!

  • ✔️ Signed CLA found.

Reviewers, don't forget to make sure that:

@kring kring changed the base branch from ue4-main to ue5-main December 6, 2022 04:22
@kring
Copy link
Member

kring commented Dec 6, 2022

The next texture creation stuff seems to be working fine on my Android phone, even before this PR.
#1011

I foolishly merged ue5-main into this branch, though. So I'll merge it there and then merge the one commit into ue4-main.

@kring
Copy link
Member

kring commented Dec 6, 2022

This is strange though isn't it, that Vulkan on Android is working fine, but Vulkan on Linux is not? Any idea why the difference?

@kring kring force-pushed the legacy-tex-mac-linux branch from c2d4899 to f539803 Compare December 6, 2022 04:38
@kring kring changed the base branch from ue5-main to ue4-main December 6, 2022 04:38
@kring
Copy link
Member

kring commented Dec 6, 2022

I force-pushed to switch this back to ue4-main instead.

@kring
Copy link
Member

kring commented Dec 6, 2022

There's a comment: "Legacy texture creation code path. Only for testing, no safety checks are done."

After this PR, we're not using it only for testing anymore. Do we need to add some safety checks?

@nithinp7
Copy link
Contributor Author

nithinp7 commented Dec 6, 2022

@kring I just looked over the code again, nothing looks particularly unsafe even though I hadn't originally intended to use it beyond testing. I think we can assume the MipPositions from Cesium Native are valid, beyond that I can't think of any other issue with it - I can remove that comment.

@kring
Copy link
Member

kring commented Dec 6, 2022

I can't seem to get the Vulkan preview rendering to work in UE5.1 to test this theory, but this seems odd... At the end of loadTextureAnyThreadPart, there are these lines:

  // Replace the image pointer with an index, in case the pointer gets
  // invalidated before the main thread loading continues.
  if (result && std::get_if<GltfImagePtr>(&result->textureSource)) {
    result->textureSource = GltfImageIndex{source};
  }

But then in loadTextureGameThreadPart, an FCesiumTextureResource is constructed with that texture source and InitRHI access it using GetImageFromSource. But GetImageFromSource only has support for GltfImagePtr and EmbeddedImageSource; all others return nullptr. So that seems sure to lead to a crash in when we try to construct FCesiumTextureData with a nullptr. There's a check, but that won't do anything in release builds. I'm probably missing some details, but it seems suspicious.

@nithinp7
Copy link
Contributor Author

nithinp7 commented Dec 6, 2022

@kring I probably did not arrive at the most sensible structure for this, I tried to abstract the image source concept but it probably made things more confusing.

I think maybe the answer to what you asked about is in this overload of loadTextureGameThreadPart:

UTexture2D* loadTextureGameThreadPart(
    const CesiumGltf::Model& model,
    LoadedTextureResult* pHalfLoadedTexture) {
  if (!pHalfLoadedTexture) {
    return nullptr;
  }

  GltfImageIndex* pImageIndex =
      std::get_if<GltfImageIndex>(&pHalfLoadedTexture->textureSource);
  if (pImageIndex) {
    pHalfLoadedTexture->textureSource = pImageIndex->resolveImage(model);
  }

  return loadTextureGameThreadPart(pHalfLoadedTexture);
}

The image index is a temporary replacement for an image pointer, so that the actual pointer can be re-resolved as needed during the game thread part of the loading. So before the actual game thread loading commences, this function is called which substitutes the image index for an image pointer again, using the model. I did it this way so the other loadTextureInGameThread didn't require a "model" since the texture could be from anywhere (e.g., raster overlay).

@nithinp7
Copy link
Contributor Author

nithinp7 commented Dec 6, 2022

Also I'll repeat what I mentioned offline that I tried Vulkan on Windows (with main), this time in UE5.1 and did not have any issues. So I wonder if there is something platform-specific in the Linux case. Regarding mac, there may be a more obvious RHI level issue that we can solve - since I never really tested the new-ish texture code with Metal.

@kring
Copy link
Member

kring commented Dec 7, 2022

@nithinp7 I finally was able to package the plugin for UE 5.1 and run the samples on my Linux system, and can confirm it worked fine even before this PR. With Vulkan:

Cmd: r.RHI.Name
Running on the Vulkan RHI

@kring
Copy link
Member

kring commented Dec 7, 2022

I spoke too soon. A release build installed as an engine plugin works fine. A debug build embedded in the project crashes with an error like this:

LoginId:75d8c4a28c1c40579a2f2a569a87447d-000003e8
EpicAccountId:

Caught signal 11 Segmentation fault

libUnrealEditor-CesiumRuntime-Linux-DebugGame.so!(anonymous namespace)::FCesiumTextureData::FCesiumTextureData(CesiumGltf::ImageCesium const&) [/home/kring/cesium/cesium-unreal-samples/Plugins/cesium-unreal/Source/CesiumRuntime/Private/CesiumTextureUtility.cpp:160]
libUnrealEditor-CesiumRuntime-Linux-DebugGame.so!(anonymous namespace)::FCesiumTextureResource::InitRHI() [/home/kring/cesium/cesium-unreal-samples/Plugins/cesium-unreal/Source/CesiumRuntime/Private/CesiumTextureUtility.cpp:299]
libUnrealEditor-RenderCore.so!FRenderResource::InitResource() [/mnt/horde/++UE5/Sync/Engine/Source/./Runtime/RenderCore/Private/RenderResource.cpp:277]
libUnrealEditor-CesiumRuntime-Linux-DebugGame.so!CesiumTextureUtility::loadTextureGameThreadPart(CesiumTextureUtility::LoadedTextureResult*)::$_108::operator()(FRHICommandListImmediate&) const [/home/kring/cesium/cesium-unreal-samples/Plugins/cesium-unreal/Source/CesiumRuntime/Private/CesiumTextureUtility.cpp:814]
libUnrealEditor-CesiumRuntime-Linux-DebugGame.so!TEnqueueUniqueRenderCommandType<CesiumTextureUtility::loadTextureGameThreadPart(CesiumTextureUtility::LoadedTextureResult*)::Cesium_InitResourceName, CesiumTextureUtility::loadTextureGameThreadPart(CesiumTextureUtility::LoadedTextureResult*)::$_108>::DoTask(ENamedThreads::Type, TRefCountPtr<FGraphEvent> const&) [/home/kring/UE_5.1.0/Engine/Source/Runtime/RenderCore/Public/RenderingThread.h:206]
libUnrealEditor-CesiumRuntime-Linux-DebugGame.so!TGraphTask<TEnqueueUniqueRenderCommandType<CesiumTextureUtility::loadTextureGameThreadPart(CesiumTextureUtility::LoadedTextureResult*)::Cesium_InitResourceName, CesiumTextureUtility::loadTextureGameThreadPart(CesiumTextureUtility::LoadedTextureResult*)::$_108> >::ExecuteTask(TArray<FBaseGraphTask*, TSizedDefaultAllocator<32> >&, ENamedThreads::Type, bool) [/home/kring/UE_5.1.0/Engine/Source/Runtime/Core/Public/Async/TaskGraphInterfaces.h:1348]
libUnrealEditor-Core.so!FNamedTaskThread::ProcessTasksNamedThread(int, bool) [/mnt/horde/++UE5/Sync/Engine/Source/./Runtime/Core/Private/Async/TaskGraph.cpp:760]
libUnrealEditor-Core.so!FNamedTaskThread::ProcessTasksUntilQuit(int) [/mnt/horde/++UE5/Sync/Engine/Source/./Runtime/Core/Private/Async/TaskGraph.cpp:648]
libUnrealEditor-Core.so!FTaskGraphCompatibilityImplementation::ProcessThreadUntilRequestReturn(ENamedThreads::Type) [/mnt/horde/++UE5/Sync/Engine/Source/./Runtime/Core/Private/Async/TaskGraph.cpp:2149]
libUnrealEditor-RenderCore.so!RenderingThreadMain(FEvent*) [/mnt/horde/++UE5/Sync/Engine/Source/./Runtime/RenderCore/Private/RenderingThread.cpp:415]
libUnrealEditor-RenderCore.so!FRenderingThread::Run() [/mnt/horde/++UE5/Sync/Engine/Source/./Runtime/RenderCore/Private/RenderingThread.cpp:566]
libUnrealEditor-Core.so!FRunnableThreadPThread::Run() [/mnt/horde/++UE5/Sync/Engine/Source/./Runtime/Core/Private/HAL/PThreadRunnableThread.cpp:25]
libUnrealEditor-Core.so!FRunnableThreadPThread::_ThreadProc(void*) [/mnt/horde/++UE5/Sync/Engine/Source/Runtime/Core/Private/HAL/PThreadRunnableThread.h:185]
libc.so.6!UnknownFunction(0x94b42)
libc.so.6!UnknownFunction(0x1269ff)

I also saw this one time, which might be a red herring or it might be a clue:

LoginId:75d8c4a28c1c40579a2f2a569a87447d-000003e8
EpicAccountId:

Assertion failed: false [File:Runtime/VulkanRHI/Private/Linux/../VulkanRHIPrivate.h] [Line: 903] A texture was marked as a shading rate source but attachment VRS is not supported on this device. Ensure GRHISupportsAttachmentVariableRateShading and GRHIAttachmentVariableRateShadingEnabled are true before specifying a shading rate attachment.

libUnrealEditor-VulkanRHI.so!FVulkanTexture::GenerateImageCreateInfo(FVulkanTexture::FImageCreateInfo&, FVulkanDevice&, FRHITextureDesc const&, VkFormat*, VkFormat*, bool) [/mnt/horde/++UE5/Sync/Engine/Source/./Runtime/VulkanRHI/Private/VulkanTexture.cpp:363]
libUnrealEditor-VulkanRHI.so!FVulkanTexture::FVulkanTexture(FVulkanDevice&, FRHITextureCreateDesc const&, FRHITransientHeapAllocation const*) [/mnt/horde/++UE5/Sync/Engine/Source/./Runtime/VulkanRHI/Private/VulkanTexture.cpp:1619]
libUnrealEditor-VulkanRHI.so!FVulkanDynamicRHI::RHICreateTexture(FRHITextureCreateDesc const&) [/mnt/horde/++UE5/Sync/Engine/Source/./Runtime/VulkanRHI/Private/VulkanTexture.cpp:938]
libUnrealEditor-VulkanRHI.so!FVulkanDynamicRHI::RHICreateTexture_RenderThread(FRHICommandListImmediate&, FRHITextureCreateDesc const&) [/mnt/horde/++UE5/Sync/Engine/Source/Runtime/VulkanRHI/Public/VulkanDynamicRHI.h:246]
libUnrealEditor-Landscape.so!FLandscapeTexture2DArrayResource::InitRHI() [/mnt/horde/++UE5/Sync/Engine/Source/./Runtime/Landscape/Private/LandscapeEditResources.cpp:122]
libUnrealEditor-RenderCore.so!FRenderResource::InitResource() [/mnt/horde/++UE5/Sync/Engine/Source/./Runtime/RenderCore/Private/RenderResource.cpp:277]
libUnrealEditor-RenderCore.so!TEnqueueUniqueRenderCommandType<BeginInitResource(FRenderResource*)::InitCommandName, BeginInitResource(FRenderResource*)::$_153>::DoTask(ENamedThreads::Type, TRefCountPtr<FGraphEvent> const&) [/mnt/horde/++UE5/Sync/Engine/Source/Runtime/RenderCore/Public/RenderingThread.h:206]
libUnrealEditor-RenderCore.so!TGraphTask<TEnqueueUniqueRenderCommandType<BeginInitResource(FRenderResource*)::InitCommandName, BeginInitResource(FRenderResource*)::$_153> >::ExecuteTask(TArray<FBaseGraphTask*, TSizedDefaultAllocator<32> >&, ENamedThreads::Type, bool) [/mnt/horde/++UE5/Sync/Engine/Source/Runtime/Core/Public/Async/TaskGraphInterfaces.h:1348]
libUnrealEditor-Core.so!FNamedTaskThread::ProcessTasksNamedThread(int, bool) [/mnt/horde/++UE5/Sync/Engine/Source/./Runtime/Core/Private/Async/TaskGraph.cpp:760]
libUnrealEditor-Core.so!FNamedTaskThread::ProcessTasksUntilQuit(int) [/mnt/horde/++UE5/Sync/Engine/Source/./Runtime/Core/Private/Async/TaskGraph.cpp:648]
libUnrealEditor-Core.so!FTaskGraphCompatibilityImplementation::ProcessThreadUntilRequestReturn(ENamedThreads::Type) [/mnt/horde/++UE5/Sync/Engine/Source/./Runtime/Core/Private/Async/TaskGraph.cpp:2149]
libUnrealEditor-RenderCore.so!RenderingThreadMain(FEvent*) [/mnt/horde/++UE5/Sync/Engine/Source/./Runtime/RenderCore/Private/RenderingThread.cpp:415]
libUnrealEditor-RenderCore.so!FRenderingThread::Run() [/mnt/horde/++UE5/Sync/Engine/Source/./Runtime/RenderCore/Private/RenderingThread.cpp:566]
libUnrealEditor-Core.so!FRunnableThreadPThread::Run() [/mnt/horde/++UE5/Sync/Engine/Source/./Runtime/Core/Private/HAL/PThreadRunnableThread.cpp:25]
libUnrealEditor-Core.so!FRunnableThreadPThread::_ThreadProc(void*) [/mnt/horde/++UE5/Sync/Engine/Source/Runtime/Core/Private/HAL/PThreadRunnableThread.h:185]
libc.so.6!UnknownFunction(0x94b42)
libc.so.6!UnknownFunction(0x1269ff)

@kring
Copy link
Member

kring commented Dec 7, 2022

The crash happens while trying to create a raster overlay texture. The image isn't null, but it points to memory that is corrupt (most likely a use-after-free).

I think that there's a race condition here, and it just happens to only show up consistently with certain configurations on certain platforms. Here's my best understanding of the sequence of events:

  1. prepareRasterInMainThread is called. At this point the CesiumTextureSource is a valid GltfImagePtr, at least after the pImageSource->pImage = &rasterTile.getImage(); line is executed.
  2. We create a FCesiumTextureResource holding that (still valid) CesiumTextureSource and hand it off to the render thread.
  3. The last reference to the RasterOverlayTile for which we're creating the texture is released, so freeRaster is called.
  4. freeRaster destroys the UTexture2D created by prepareRasterInMainThread. Note that this does not block on the texture's ReleaseFence; that is only checked in IsReadyForFinishDestroy. So the texture won't be fully destroyed until after the renderer thread does its thing, but freeRaster can return before that happens.
  5. Once freeRaster returns, the RasterOverlayTile is destroyed, invalidating the CesiumTextureSource created earlier.
  6. The renderer thread attempts to access the already-freed CesiumTextureSource and gets undefined behavior.

I'm not certain that step (4) is happening, though. Or why. It seems a little surprising to be freeing a raster so early. This should be easy to verify, but debugging UE on Linux is an extraordinarily slow process for some reason (like, it takes 10 minutes to start up). So I'm just reporting what I know for now.

But assuming the above is close to right, I'm not completely sure what to do about it. It will be tricky to completely avoid this sort of race condition.

@nithinp7
Copy link
Contributor Author

nithinp7 commented Dec 7, 2022

Thanks for looking into this @kring! I think I was finally able to reproduce this on Windows with Vulkan, probably the same underlying issue. I set the raster tile cache size to 0 and spammed the "Refresh Tileset" button and it seems to be a good way to catch that race condition, where the raster tile is freed before the render thread creation task is complete.

One option is we could create a new fence around the render thread texture creation and check that during destroyTexture - that way textures won't get destroyed before FCesiumTextureResource::InitRHI() finishes. Now in the ridiculous 0-cache size case this would cause stutters as freshly-loaded textures are immediately destroyed when the camera moves, causing the destruction to wait for the render thread (which might be 1-3 frames behind). This might also happen if way too many tiles are being loaded for a given view (e.g., 1 SSE), the cache will be filled with "required" tiles and also have constant turnover similar to the 0-cache size. In typical cases, I imagine a newly loaded texture should still be unlikely to be deleted immediately, but I still am tempted to find a non-blocking solution.

@nithinp7
Copy link
Contributor Author

nithinp7 commented Dec 7, 2022

@kring I can move this idea to a different issue once I work it out but:

The main motivation for the rambling below is that it would be nice to be able to do something like e.g., pass around shared_ptr<Image> that the renderer can use to keep an image alive during texture creation. I don't think fixing the current issue alone warrants implementing the below refactor idea, but the idea might have several other advantages long-term that make it worthwhile. Let me know what you think!

I was thinking recently about how both ImageCesium and BufferCesium are sort of odd implementation details in contrast to the rest of CesiumGltf which can be serialized / parsed almost entirely with auto-generated code from schema. For instance, the CesiumGltf::Model can own the image descriptions (CesiumGltf::Image), but it does not need to own the image content itself (e.g., the parsed image representation currently in ImageCesium). So maybe we could fully decouple those and instead provide utility functions like readImage that read-out to an externally defined image representation (maybe move the ImageCesium functionality to some Cesium3DTilesSelection::Image).

The advantage of decoupling those is that we can provide a lot more implementation-specific features in the new image class. Some of these features would feel very awkward in CesiumGltf. These might include:

  • Representing the renderer content corresponding to the image in the same image class. This partially decouples resource management concerns from the tile and rasterTile.
  • Allows the images to be separately maintained in some sort of global resource table.
  • Changing the ownership semantics of the image itself (e.g., enables sharing of external image render content). We might have a global resource table, individual tiles / raster tiles, and texture creation tasks that all have a shared_ptr<Image> to the same image resource.
  • Decouple the lifecycle of individual images from the tiles or raster tiles they correspond to (this is what might be relevant here). It would awkward to have tile loading explicitly express main-thread-followup render-thread work in its state space. It would be better to decouple images so that tiles / rastertiles can be released whenever, but the underlying images simply won't be released until they have no further references.

@nithinp7
Copy link
Contributor Author

nithinp7 commented Dec 8, 2022

@kring I think Vulkan (and almost certainly Metal as well) should be working fine now. The old texture creation path is used for any device where GRHISupportsAsyncTextureCreation is false. The issue with the white tilesets that I mentioned offline has also been resolved.

When all the shaders were compiling for Vulkan, I ran into an error message and debug assertion "Array resized during ranged-for iteration". I couldn't see how that was related to this last change (or anything in this PR for that matter). Now that the shaders are done compiling I can't seem to run into that assertion failure at all. I tried making dummy changes to the material to retrigger shader compiling, but it doesn't seem to cause the assertion failure anymore. Please keep an eye out for that one, but I'll keep trying to reproduce it on my end.

@nithinp7 nithinp7 requested a review from kring December 8, 2022 01:25
@kring kring merged commit 3635569 into ue4-main Dec 9, 2022
@kring kring deleted the legacy-tex-mac-linux branch December 9, 2022 01:31
@kring
Copy link
Member

kring commented Dec 9, 2022

Thanks @nithinp7!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants