-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monitor #4
Monitor #4
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
I looked through the code in this merge in detail again this evening. There are a couple of minor improvements I have. I've implemented these as four commits on my fork https://github.com/phdum/mpi/tree/monitor. (Not sure how best to add these commits to the merge request)
-
The only actual problem I found was the case where one could call
request_emergency_stop()
multiple times from the same mpi rank. This would trying and send multiple mpi messages, even though only one was registered. I've fixed this so thatrequest_emergency_stop()
does not do anything iflocal_stop
is already set. -
I've expanded the tests. This includes checking the above point. Also, it adds multiple new test cases that were not covered before: if the request_emergency_stop occurs on the root itself, and if there are multiple (2) request_emergency_stop calls.
-
I've polished the comments to hopefully make the logic easier to follow
Apart from that everything looks good to me. At least the logical paths I could think of testing all hold together.
- I've not looked at the changes to macros.hpp and comm_split.cpp which are a part of this merge request, but not part of the monitor code.
Thanks!
I've now pushed my commits to the branch, which updated the pull request. |
- same as nda, with a guard, not pragma once, for multiple inclusion
- A simple class to monitor failure/exception on mode used mainly for MC classes in TRIQS at this stage.
- Clean a bit API
* Has different code logic for communication when root fails * Small clean up it readability in code and console output
* further calls should not send extra signals * add to test function
* test for 0, 1, 2 failures on nodes
-rename finalize -> finalize_communications -rename should_stop -> emegency_occured -use emergency_occured() to get final result also after finalization
Ok, fine with the finalize change. Fine to merge. |
Draft of the little monitor class.
To be reviewed.
More tests may be needed. TBD.