Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silent crash on certain Windows machines after restart #2122

Open
Endaris opened this issue Nov 29, 2024 · 3 comments
Open

Silent crash on certain Windows machines after restart #2122

Endaris opened this issue Nov 29, 2024 · 3 comments
Labels

Comments

@Endaris
Copy link

Endaris commented Nov 29, 2024

I'm bringing forward an issue brought forward to me. It was reported by 6 individual users which given the small user base for the game in question is substantial.
I believe this crash to be an issue with either love or alternatively AMD graphics drivers but I lack both the expertise and the tools to troubleshoot this issue properly.
As I cannot replicate the issue personally, this issue will contain a good deal of guesswork and anecdotal evidence that might be useful but could also be misleading in some cases.

Outline:
The game is deployed via a fused updater that downloads a .love file into the save directory, mounts it at "" and then runs it.
Previous versions of the updater achieved the "run" part by clearing out the cache for main and conf and rerunning love.init like this:

if not love.filesystem.mount(gamefilePath, '') then
  error("Could not mount file " .. gamefilePath)
else
  pcall(logger.write, logger)
  package.loaded.main = nil
  package.loaded.conf = nil
  love.conf = nil
  love.init()
  love.load(arg)
end

As the new updater was made to be somewhat generic, the startup was changed to utilize the new restart event to pass the file to be mounted as an argument so that it could be mounted immediately in conf.lua, so that the window could be cleanly initialized with the game's conf rather than having to rely on certain settings that can only be modified on window creation to already be set in the updater's conf itself:

  love.event.restart({ restartSource = "updater", startUpFile = version.path })

and then performing similar steps as previously to mount the game file and using the game file's conf when the updater's conf.lua is being required by the boot process.

You can find a version of the offending updater here:
https://github.com/panel-attack/panel-updater/tree/df1c5a6ed52229f2b64ed1711e47057f945ba121
The game in question can be found here:
https://github.com/panel-attack/panel-game

Upon restart, the game would start up correctly at first and then crash a few seconds into the load up process like this:

2jb6ROJ8aH.mp4

This happens nearly 100% of the time. The likelihood of it happening seemed slightly reduced if there was actually an update happening as affected users sometimes got the game to start up beyond that after an update but only once.
I gave a version to one affected user that tightly logged the game's startup process to disk which revealed that the crash does not happen on loading up any specific assets or any specific code but seemingly arbitrarily. The part it typically crashed at had many calls of love.graphics.newImage but the logging indicated that the crash did happen outside of that.

There is a sizeable amount of Windows users in the community that cannot replicate this issue. To my knowledge this issue has only been observed on Windows.

One of the affected users also got an extremely brief popup that they captured per video and shared as a screenshot:

grafik

As the new updater also presented the move to love 12 as well as a major refactor of the game it took me a long time and the help of one of the affected users to be relatively sure that simply changing the startup back to the old method would likely fix the issue for the affected users.

I tried asking the affected users for their OS/GPU after the lua panic issue looked like something graphics related:

System 1:
Windows 10, exact version was not specified
Radeon RX 5600 XT; driver version 32.0.11027.1003

System 2:
Windows Version 10.0.22631 Build 22631
Radeon RX 6600; driver version 24.8.1
This is the setup that produced the lua panic screenshot.

System 3:
Windows 10, exact version was not specified
Radeon RX590 Sapphire; driver version 31.0.14051.1000

System 4:
Windows 10 v. 22H2 build SO 19045.5131
Radeon RX 570; driver version not specified

System 5:
Windows 10 22h2, exact build version not specified
Radeon RX 570; driver version not specified

System 6:
response pending

The prevalence of Radeon RX among affected users is suspicious to say the least. I started a survey today to collect data that is a little less anecdotal than this and I'll report back with it in a week.

After toying around a bit with things an initial workaround seemed to be to force the use of the vulkan as the graphics driver. In practice it proved to not actually be a workaround. While the silent crash is significantly less prevalent if vulkan is used and it pretty much never happens that soon on startup, the game will still silently crash at a later point.

With one of the affected users I tried several adjustments of the game part code that actually gets to run but while some changes seem to make the crashes less frequent, they still occur often enough. Likewise not looking for updates (and thus excluding the presence of threads) in the updater code seems to reduce the occurrence of the issue but it may still crash.

The original release of the updater was fused with a love 12 CI version that still used SDL2. The issue occurs with that older version as well as newer CI builds that use SDL3.

Current indications are that using the internal reinit without restarting will be a suitable workaround. It is still being tested though and only a workaround as restarting would be the cleaner and preferable solution if not for the crashes.

I tried to create a more reduced repro but it ultimately did not prove to be feasible when I had to ask affected users to test every time so I'm hoping someone else may be in a situation where they can reproduce the issue and troubleshoot things properly.

@MikuAuahDark
Copy link
Contributor

For the one with "PANIC" error message, I believe it's fixed in libsdl-org/SDL#11257. I have AMD integrated graphics so I'll check it out ASAP.

@Endaris
Copy link
Author

Endaris commented Nov 30, 2024

For the one with "PANIC" error message, I believe it's fixed in libsdl-org/SDL#11257. I have AMD integrated graphics so I'll check it out ASAP.

To clarify, the PANIC error message occurred with the release using SDL2. As far as I understood the linked issue was SDL3 exclusive.

The affected user that has been testing for me did an extensive session last night after I posted this and reverting restart back to reinit seems to have resolved the issues for them.
I also gave them a test version spinning restart earlier but that did not seem to produce the crash.

@Endaris
Copy link
Author

Endaris commented Dec 19, 2024

As of today the issue has been confirmed to also occur in love 11.5.
I'll try to keep narrowing it down to a reasonable repro but as I rely on other people to test things progress will be slow and it may take months. I'll update as I learn new things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants