Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor #4

Merged
merged 13 commits into from
Sep 22, 2020
Merged

Monitor #4

merged 13 commits into from
Sep 22, 2020

Conversation

parcollet
Copy link
Member

Draft of the little monitor class.
To be reviewed.
More tests may be needed. TBD.

@parcollet parcollet requested a review from Wentzell August 7, 2020 19:08
@Wentzell Wentzell requested review from phdum and removed request for Wentzell September 8, 2020 14:55
Copy link
Member

@phdum phdum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

I looked through the code in this merge in detail again this evening. There are a couple of minor improvements I have. I've implemented these as four commits on my fork https://github.com/phdum/mpi/tree/monitor. (Not sure how best to add these commits to the merge request)

  • The only actual problem I found was the case where one could call request_emergency_stop() multiple times from the same mpi rank. This would trying and send multiple mpi messages, even though only one was registered. I've fixed this so that request_emergency_stop() does not do anything if local_stop is already set.

  • I've expanded the tests. This includes checking the above point. Also, it adds multiple new test cases that were not covered before: if the request_emergency_stop occurs on the root itself, and if there are multiple (2) request_emergency_stop calls.

  • I've polished the comments to hopefully make the logic easier to follow

Apart from that everything looks good to me. At least the logical paths I could think of testing all hold together.

  • I've not looked at the changes to macros.hpp and comm_split.cpp which are a part of this merge request, but not part of the monitor code.

Thanks!

@phdum
Copy link
Member

phdum commented Sep 14, 2020

I've now pushed my commits to the branch, which updated the pull request.

parcollet and others added 13 commits September 21, 2020 15:22
- same as nda, with a guard, not pragma once, for multiple inclusion
- A simple class to monitor failure/exception on mode
  used mainly for MC classes in TRIQS at this stage.
- Clean a bit API
* Has different code logic for communication when root fails
* Small clean up it readability in code and console output
* further calls should not send extra signals
* add to test function
* test for 0, 1, 2 failures on nodes
-rename finalize -> finalize_communications
-rename should_stop -> emegency_occured
-use emergency_occured() to get final result also after finalization
@parcollet
Copy link
Member Author

Ok, fine with the finalize change. Fine to merge.

@Wentzell Wentzell merged commit dbf2ac1 into TRIQS:unstable Sep 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants