Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono] Fix for issue 109410 #113140

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

[mono] Fix for issue 109410 #113140

wants to merge 3 commits into from

Conversation

kg
Copy link
Member

@kg kg commented Mar 4, 2025

Issue #109410 appears to be a case where klass is 0 when we perform an isinst operation, but the cache and obj are nonzero and look like valid addresses. klass is either a compile-time (well, jit-time) constant or being fetched out of the cache (it looks like it can be either depending on some sort of rgctx condition).

This PR adds null checks in two places along with a memory barrier in the location where we believe an uninitialized cache is being published to other threads.

@kg kg marked this pull request as ready for review March 5, 2025 00:19
@Copilot Copilot bot review requested due to automatic review settings March 5, 2025 00:19

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

@kg
Copy link
Member Author

kg commented Mar 5, 2025

@BrzVlad noticed that we might need a memory barrier to prevent other threads from observing a NULL klass in the cache, which is compatible with the failure on CI and the failures reported in the field. Both CI and the reports involve multithreading, so it's possible there's a race here.

We haven't seen this on x64, which makes sense because x86 and x64 have a stronger memory model, but I need to figure out whether this particular failure is possible with ARM's weaker memory model or not.

Remove emitted nullcheck in jitcode (too expensive)
@kg kg changed the title [mono] Speculative assertions for issue 109410 [mono] Fix for issue 109410 Mar 5, 2025
@steveisok steveisok requested review from lateralusX and BrzVlad March 5, 2025 16:46
@@ -6806,6 +6806,8 @@ mono_object_handle_isinst (MonoObjectHandle obj, MonoClass *klass, MonoError *er
{
error_init (error);

g_assert (klass);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert is not particularly helpful, we will crash immediately at the line below anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on wasm we wouldn't, but i don't know if this code runs on wasm

// we need a barrier before publishing data via mrgctx->infos [i] because the contents of data may not
// have been published to all cores and another thread may read zeroes or partially initialized data
// out of it, even though we have a barrier before publication of entries in mrgctx->entries below
mono_memory_barrier();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could an option be to use a local array in this loop and then do a memcpy into mrgctx->infos after the memory barrier we already fire below (where we copy entries)? That way we will only fire one memory barrier instead of info->num_entries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants