-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent crash in LLVM_Util::getPointerToFunction(llvm::Function* func) #1712
Comments
Due to the intermittivity of this it's hard to debug, and often I get a crash with no useful callstack, only an "abort was called" exception. I will try to figure more out, but if the above gives you any "Heureka" ideas @lgritz let me know |
I'm wondering if it can have anything to do with issue #1427 ? |
Better call stack, with some of the LLVM stuff untangled: @lgritz
|
Also, the llvm comment says 2GB and yet the check is done with |
So it seems this IMAGE_REL_AMD64_ADDR32NB mode is a 32-bit offset based thing, but the one at the end of the above screenshot, IMAGE_REL_AMD65_ADDR64 is true 64 bit. I mad a Godawful Hack(tm) in LLVM code like so, so any function that made the decision to use the former mode instead used the latter mode....: ....and the problem disappeared. Now is this a good fix? I highly doubt it, but.....?? /Z |
I think we should report this on the llvm-dev forums, probably in the "code generation" board? Zap, can you take care of that? I feel like it's more efficient for you to do that communication rather than me having to be the go-between. You're much more familiar with the relevant LLVM stack traces and internals than I am at this point. I think there are three things to try to get out of that interaction:
Now, on our end, we are in a bit of a pickle in that we still have a lot of work to make OSL work with LLVM 16+. They are close to releasing 17, and definitely will not backport fixes as far back as 15. So you may be forced to maintain those patches on your end at Autodesk (you seem to be the only ones running into this problem) until we can all upgrade to the latest LLVM that would have a fix. But like I said, if they have a suggestion for how to ameliorate the problem from our side, that's the best option. |
Well actually I got a lot of (probably great, but I barely understand them due to being a total LLVM noob) replies here: llvm/llvm-project#65641
Does any of that tell you anything?
They say this bit is only relevant for debugging and "exception handling", are we using exception handling in OSL?
They say we can "turn it off and the problem goes away".
/Z
From: Larry Gritz ***@***.***>
Sent: Thursday, September 7, 2023 6:48 PM
To: AcademySoftwareFoundation/OpenShadingLanguage ***@***.***>
Cc: Zap Andersson ***@***.***>; Author ***@***.***>
Subject: Re: [AcademySoftwareFoundation/OpenShadingLanguage] Intermittent crash in LLVM_Util::getPointerToFunction(llvm::Function* func) (Issue #1712)
EXTERNAL EMAIL : Do not click any links or open any attachments unless you trust the sender and know the content is safe.
I think we should report this on the llvm-dev forums<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdiscourse.llvm.org%2F&data=05%7C01%7Czap.andersson%40autodesk.com%7C8e7c1dec139a42a1452108dbafc22b2b%7C67bff79e7f914433a8e5c9252d2ddc1d%7C0%7C0%7C638297020704841041%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=4iGfvTB%2Fo7mNcRohKib9ucJPlB%2FhcTeUvHDgLXY0XEo%3D&reserved=0>, probably in the "code generation" board?
Zap, can you take care of that? I feel like it's more efficient for you to do that communication rather than me having to be the go-between. You're much more familiar with the relevant LLVM stack traces and internals than I am at this point.
I think there are three things to try to get out of that interaction:
1. Have somebody on the LLVM team confirm that we're on the right track, that this patch is essentially correct and does no additional harm, or else that we're totally misguided and there is something different we should be doing to address the problem.
2. Convince somebody there to take the ball and turn this (or any other approach they prefer) into a patch that will permanently fix future LLVM releases.
3. If they have a suggestion for something we can do on the OSL side to avoid this, that's even better. Like, are we hitting a 32 bit limit only because we are being exceptionally silly about what we're handing LLVM, forgetting to clear something between shader group builds, or something like that?
Now, on our end, we are in a bit of a pickle in that we still have a lot of work to make OSL work with LLVM 16+. They are close to releasing 17, and definitely will not backport fixes as far back as 15. So you may be forced to maintain those patches on your end at Autodesk (you seem to be the only ones running into this problem) until we can all upgrade to the latest LLVM that would have a fix. But like I said, if they have a suggestion for how to ameliorate the problem from our side, that's the best option.
-
Reply to this email directly, view it on GitHub<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAcademySoftwareFoundation%2FOpenShadingLanguage%2Fissues%2F1712%23issuecomment-1710481755&data=05%7C01%7Czap.andersson%40autodesk.com%7C8e7c1dec139a42a1452108dbafc22b2b%7C67bff79e7f914433a8e5c9252d2ddc1d%7C0%7C0%7C638297020704841041%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=U%2FHjW6cTPGvFoDtRh0qvbiTVh0SO8LwsMYbM%2FJPjYJk%3D&reserved=0>, or unsubscribe<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAYM5MXCX5Y6F64STRC3FU3TXZH3DJANCNFSM6AAAAAA3VJZD6M&data=05%7C01%7Czap.andersson%40autodesk.com%7C8e7c1dec139a42a1452108dbafc22b2b%7C67bff79e7f914433a8e5c9252d2ddc1d%7C0%7C0%7C638297020704841041%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7T%2B%2FvMOhNsXUv8DXhhd0ZgJtQin7U3MosQcOHQ2Vvg4%3D&reserved=0>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Yes, lots of good replies at llvm/llvm-project#65641 ... OSL has a line that reads (in llvm_util.cpp https://github.com/AcademySoftwareFoundation/OpenShadingLanguage/blob/main/src/liboslexec/llvm_util.cpp#L1442)
I'll try to set it to "::Large" or "::Medium" and see if this changes things (apparently ::Small is default?(? Does this make sense? |
Well, we have that already. My hack is most certainly WRONG :) |
Exceptions: we're definitely not relying on them. But perhaps there is there a way to explicitly turn them off, which we have neglected to do? setCodeModel: that may be fruitful. What happens if you make this call, and pass |
In my quick test, setting CodeModel::Large did not change anything, but it was a very late friday semi-aborted test so I will double check. But I could see the condition for this fatal error still getting hit (tho I didn't spend enough time to truly get the crash, I just verified that the "type" of relocation block was still in use.) Note the latest post on the LLVM project here llvm/llvm-project#65641 (comment) in reply to my question about "Memory Managers" If the "MemoryManager" is what doles out this memory to LLVM, then, maybe that is the problem....? According to them OSL is using it's own "MemeoryManager" because....(?) |
Okay.... some new info.... OSL uses a custom memory manager, that is held by rendering threads per-thread-info stuff. And this memory manager is kept around until the last rendering thread dies. Sounds reasonable on paper.... Except... we use TBB for rendering. TBB actually has a set of worker threads that are always in flight. So those threads never die. So the no destructor is ever hit on the per-thread data. So the memory manager ends up being kept around forever. That wouldn't be a big deal, in the normal case. Except I also see this in the OSL wrapped memory manager (https://github.com/AcademySoftwareFoundation/OpenShadingLanguage/blob/main/src/liboslexec/llvm_util.cpp#L244): Okay, so if memory is never ever thrown away, of course we can get beyond a 2GB limit. I tested it, and in max, the memory manager isn't destroyed until the app closes..... |
Problem
In 3ds max, we have lots of users crashing with a callstack that seems to be caused by this problem.
We have a scene that "reproduces" the problem, but the reproduction is intermittent and seems to a race condition of sorts.
Basically, you load a particular file, you start an interactive render and the material editor at the same time, then start changing parameters in the material many many many many many times. Eventually, we get this crash. Or not. Depending on phase of the moon, the wind direction, humidity, etc.
Crash is reported on this line:
...i.e. in the case this function is reached before the shader has been optimized. Somehow, it seems like the call to exec->finalizeObject(); crashes.
The call stack is something like this:
Expected behavior:
It not to crash?
Actual behavior:
It crash. Sometimes.
Steps to Reproduce
Versions
The text was updated successfully, but these errors were encountered: