Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Work around Imath UB in bvh code that was icx problem #1882

Merged
merged 2 commits into from
Oct 11, 2024

Conversation

lgritz
Copy link
Collaborator

@lgritz lgritz commented Oct 10, 2024

Once it was not failing with NaNs or infinite loops, there were still 3 tests that differed every so slightly in a few values for icx (more lax math, maybe?). So had to add a few alternate reference images.

@lgritz
Copy link
Collaborator Author

lgritz commented Oct 10, 2024

We discussed this at length in today's TSC meeting. Thanks again Alex and Steena for making the extra effort to analize the problem from every angle.

I was able to make the "reinterpret_cast" solution work with icx. I'm not sure where I went wrong before. I updated this PR and think this is the very simplest solution to get us unstuck on the OSL end of things.

@AlexMWells
Copy link
Contributor

AlexMWells commented Oct 11, 2024

Please note the "fix" here should be legal from a data lifetime point of view:
Taking address of Vec3 should keep 12 bytes alive vs. taking address of Vec3::x would only keep 4 bytes alive of a temporary object.

That being said mixing code that accesses Vec3::x,y,z and comp(vec,0|1|2) could be considered aliasing and create some undefined behavior at some other point.

I would recommend an aliasing free and Scalar Replacement of Aggregates (SROA) compatible approach of selection(masking) or blending x,y,z together based on the axis be used.

@etheory
Copy link
Contributor

etheory commented Oct 11, 2024

The title of this indicates that this is due to incorrect icx behavior, but @AlexMWells it didn't sounds like it really is a bug with icx right? It's going to happen in all compilers?

@lgritz
Copy link
Collaborator Author

lgritz commented Oct 11, 2024

The title of this indicates that this is due to incorrect icx behavior, but @AlexMWells it didn't sounds like it really is a bug with icx right? It's going to happen in all compilers?

The thing Imath is doing is definitely "undefined behavior" according to C++. There's no doubt about that. You can't take the address of a declared single float and expect anything at all about what's in the subsequent memory, like you could if it were an array.

Icx is the only compiler (in our CI suite) that optimizes in such a way that the code it generates didn't match our 1980s era intuition about what that code should do -- which is to have the y and z components immediately following and that if we ask for that via pointer arithmetic, we'd get what we know should be there (which we weren't, because the compiler doesn't have to store those other struct fields in memory at all if they don't ever seem to be accessed by name). At the moment, it's only icx to show the problem, but it's possible that in the future, other compilers could also become symptomatic.

I didn't mean that icx had a bug, just (as tersely as one is in the first line of a commit message) that it only appeared result in a problem for us (i.e. failing test) in the icx test.

Once it was not failing with NaNs or infinite loops, there were still
3 tests that differed every so slightly in a few values for icx (more
lax math, maybe?). So had to add a few alternate reference images.

Signed-off-by: Larry Gritz <[email protected]>
Signed-off-by: Larry Gritz <[email protected]>
@lgritz
Copy link
Collaborator Author

lgritz commented Oct 11, 2024

Because this fixes CI for us and we discussed this at length in the TSC meeting yesterday, I'm going to merge now. But if somebody has further thoughts about better ways to solve this, please feel free to speak up and we can continue to revise.

@lgritz lgritz merged commit 2c8c9d4 into AcademySoftwareFoundation:main Oct 11, 2024
22 checks passed
@lgritz lgritz deleted the lg-icx branch October 11, 2024 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants