-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different output for v3.0 and v3.1 #166
Comments
Dear @ec147, that is indeed quite odd. I had a brief look into your code and from a first glance this looks all good. In principle the only changes happened from triqs 3.0 to 3.1 that could really influence this are the stat changes in TRIQS itself (@Wentzell correct me if I am wrong). Within cthyb the changes are minimal. We have several benchmark scripts: https://github.com/TRIQS/benchmarks and I think they have been tested with 3.1.x without problems. Moreover, your 3.1.x result looks really wrong, so that something must be wrong here. Did I see correctly that you stored the G0_iw to text file. Are those identical? Can you provide the std output from the solver. I would like to check if the solver worked with the same local Hamiltonian, detected the same number of subspaces, and reported similar acceptance rates. Best, |
Thanks for your feedback. I found the issue and easily fixed it ; in the latest version of the mpi dependency, the MPI environment is activated with the variable has_env, which is set to True if one of the following environment variables is found: OMPI_COMM_WORLD_RANK, PMI_RANK or CRAY_MPICH_VERSION. However, I'm using a SLURM environment which has a different environment variable (SLURM_PROCID I think). |
Glad to hear that the issue is resolved for you. May I ask how you solved it? In principle we rely on this MPI detection feature to work. If there is any cluster environment where it does not work out of the box please let us know. We are happy to add additional environment variable checks. |
Sure ; I simply replaced the line 44 of the mpi.hpp header file by "if (std::getenv("SLURM_PROCID") != nullptr or std::getenv("OMPI_COMM_WORLD_RANK") != nullptr or std::getenv("PMI_RANK") != nullptr or std::getenv("CRAY_MPICH_VERSION") != nullptr)" . |
Interesting. I understand that @Wentzell do you understand why our MPI detection fails in this case? |
Yes, I just checked and it seems like the environment variable SLURM_PROCID is also set even for sequential runs, so my way is not the proper way to fix the issue. I just wanted to find an easy workaround without thinking too much about it, and this is not a problem for me since I'm always parallelizing my runs, so I always want the MPI environment to be activated. I'm really not an expert on SLURM environments, so I cannot really help you further unfortunately. I'm using openmpi. |
I agree that |
I'm using v4.1.4.4 of openmpi. |
Okay, I see. I wonder if we should add a cmake flag to enforce the MPI init, skipping the detection of an MPI environment (like the way it was before we introduced this check) to have a quick workaround in those cases? |
@the-hampel Maybe we could just check if |
I think I like that idea. Let me add this and try it out. |
I added two PR's to add the feature. One in triqs: TRIQS/triqs#883 to check in the Python layer, and one in triqs/mpi itself: TRIQS/mpi#11 . This allows to do this:
If this looks good please merge. |
Thank you @the-hampel, these pull requests have both been merged. |
TRIQS_ABINIT_interface-code.pdf
I have made two calculations with CT-HYB, one with the version 3.0 and one with the version 3.1. Both have strictly the same parameters and same G0(w) as inputs. Yet the G(tau) output of version 3.1 is very noisy and highly non-physical (first picture) while the output of version 3.0 is satisfactory (second picture). The calculation is parallelized over 2048 CPUs.
Do you have any idea of the cause of this discrepancy between both versions ?
I'm putting attached the C++ code I used ; which is part of the DFT code Abinit, which gives me the G0(w) and U matrix as input for CTHYB.
The text was updated successfully, but these errors were encountered: