-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix an issue in simple-kernel-timer with mismatched linked symbols #199
Fix an issue in simple-kernel-timer with mismatched linked symbols #199
Conversation
Basic Kokkos Serial build and test failed for me on MacOS. Symptom was that the simple kernel timer "currentEntry" symbol was null during end of parallel for call back. Tracked that down to that symbol having different addresses inside the increment_counter function and the end of parallel_for callback. I believe moving the increment_counter functions to not be inlined may help? Or do we need accessor functions which are uniform? At least this change did fix the issue for me.
Confirming that this PR of @crtrott works for me on a Mac OSX with latest Kokkos 4 from kokkos-core's develop branch. |
Some sample output trying this out on my laptop, with the Kokkos develop branch using serial build. This also seems to work with other tools like space-time-stack (which is what is currently causing failing tests on PR #194 though fortuitously not on develop).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the files, where the function increment_counter are moved from .h to the corresponding .cpp file.
It makes and looks good to me.
I have tested this as well and this works for different cases of profilers. (see sample output in PR).
I'd like to understand the issue some better. It's not quite clear to me if and how the current code exposes a ODR-violation. |
Its not an ODR issue, I suspect that we got two different copies of the global variable via the two different shared libraries linked in? |
Basic Kokkos Serial build and test failed for me on MacOS. Symptom was that the simple kernel timer "currentEntry" symbol was null during end of parallel for call back.
Tracked that down to that symbol having different addresses inside the increment_counter function and the end of parallel_for callback.
I believe moving the increment_counter functions to not be inlined may help?
Or do we need accessor functions which are uniform?
At least this change did fix the issue for me.