-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reducing the computational cost for the new implementation of EEXE with MPI-enabled GROMACS #13
Comments
In the commit 344afa6, I replaced the for loop mentioned in the comment above with the lines below, which used
As a result, completing 50 iterations took around 270 seconds, which is around 15% faster than the serial generation of tpr files above, but still much slower than the original implementation. I also tried using Notably, here is a summary about the two kinds of executor in
|
In the commit 22528e5, I tried the following, which uses
I also tried the following in the commit bb2d53c:
As a result, both of them took around 330 seconds. |
In the commit a2ab912, I tried using
As a result, it took around 313 seconds to finish 50 iterations. |
In commit 581b081, I tried using
As a result, it took 305 seconds to finish all 50 iterations. |
In the commit 1340f40, I tried
As a result, it took around 414 seconds to finish 50 iterations, which is the slowest amount all options. This is not surprising, though. See the note (summarized from GPT-4) in the next comment. |
|
I additionally tried using
It took around 270 seconds to finish 50 iterations with this approach. This was attempted in commit 1fc0463. |
Okay, according to more tests that I performed later, I found that the issue might be that the start time of the mdrun command with the Here is a summary of the results from the additional tests, which all completed 50 iterations of 1250-step EXE simulations of the anthracene system. All the tests were based on the commit 1fc0463.
|
I created a PR corresponding to this issue. The following work/discussions will be logged in the PR. |
We later confirmed that the start time for
Specifically, the difference between the timestamps can be several seconds longer than the wall time shown at the end of the log time, which does not include the start time. Without using the flag Also, all the approaches proposed above for parallelizing GROMACS commands do not work for multiple nodes, in which case using MPI-enabled GROMACS does not really bring any advantages. At the end of the day, the goal is to run EEXE on multiple nodes using MPI-enabled GROMACS, if possible. Given that the current implementation that allows MPI-enabled GROMACS (1) incurs higher computational cost than the original implementation (2) and is not closer to the scenario where the EEXE might be the most useful, we decided to fall back to the original implementation of EEXE that ONLY works with thread-MPI GROMACS. Changes necessary for this decision have been made mainly in the commit 4c73f03. I'm closing this issue since this is no longer relevant in the short term. |
Here is a summary of ideas and efforts relevant to the issue.
I'll re-open the issue if new possible workarounds are proposed. |
In the commit c661fb8, we have enabled the use of MPI-enabled GROMACS and disabled the use of thread-MPI GROMACS. However, as discussed in issue #10 , in the original implementation, the execution of multiple grompp commands was parallelized by
mpi4py
using the conditional statementif rank < self.n_sim:
, while in the new implementation that allows MPI-enabled GROMACS, the GROMACS tpr files are generated serially using the following lines in the functionrun_grompp
:I tested the code with a fixed-weight EEXE simulation for the anthracene system with
nst_sim=1250
andn_sim
on Bridges-2 with 64 cores requested andruntime_args={'-ntomp': '1'}
andn_proc=64
. As a result, 50 iterations took 327 seconds. This is much longer than the original implementation that usedmpi4py
to parallelize the GROMACS grompp commands. Specifically, using the new implementation, 20000 iterations would take approximately 327 * 400 seconds, which is around 36 hours, much longer than 13 hours required to finish the same simulation using the original implementation.In light of this, we should figure out a way to parallelize the GROMACS grompp commands without introducing any noticeable overhead and without using
mpi4py
(to prevent nested MPI calls). Also, we should try to identify if there is any other source that contributed to higher computational costs in the new implementation of EEXE.The text was updated successfully, but these errors were encountered: