Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mip4py: Regression in GitHub Actions #12857

Open
dalcinl opened this issue Oct 13, 2024 · 21 comments · Fixed by openpmix/prrte#2043
Open

mip4py: Regression in GitHub Actions #12857

dalcinl opened this issue Oct 13, 2024 · 21 comments · Fixed by openpmix/prrte#2043
Assignees

Comments

@dalcinl
Copy link
Contributor

dalcinl commented Oct 13, 2024

mpi4py 's own nightly CI testing on GHA has been failing with ompi@main in the last three days.
However, I understand that OMPI's own CI had not failed, otherwise you would not have merged in the regression.

These are the full logs from the last failed run:
https://github.com/mpi4py/mpi4py-testing/actions/runs/11310080041/job/31454714552
This is the specific error, not the message prte-rmaps-base:all-available-resources-used in the output:

test_async_error_callback (test_util_pool.TestProcessPool.test_async_error_callback) ... 1 more process has sent help message help-prte-rmaps-base.txt / prte-rmaps-base:all-available-resources-used
Exception in thread Thread-7 (_manager_spawn):
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
    self.run()
  File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/threading.py", line 1012, in run
    self._target(*self._args, **self._kwargs)
  File "/home/runner/work/mpi4py-testing/mpi4py-testing/build/lib.linux-x86_64-cpython-312/mpi4py/futures/_core.py", line 350, in _manager_spawn
    comm = serialized(client_spawn)(pyexe, pyargs, nprocs, info)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/work/mpi4py-testing/mpi4py-testing/build/lib.linux-x86_64-cpython-312/mpi4py/futures/_core.py", line 1058, in client_spawn
    comm = MPI.COMM_SELF.Spawn(python_exe, args, max_workers, info)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/mpi4py/MPI.src/Comm.pyx", line 2544, in mpi4py.MPI.Intracomm.Spawn
    with nogil: CHKERR( MPI_Comm_spawn(
mpi4py.MPI.Exception: MPI_ERR_UNKNOWN: unknown error

On closer inspection, I noticed a difference between mpi4py and open-mpi configuration:

mpi4py configures oversubscription via $HOME/.openmpi/mca-params.conf
https://github.com/mpi4py/mpi4py-testing/blob/master/.github/workflows/openmpi.yml#L101

Open MPI configures oversubscription in both $HOME/.openmpi/mca-params.conf and $HOME/.prte/mca-params.conf
https://github.com/open-mpi/ompi/blob/main/.github/workflows/ompi_mpi4py.yaml#L80

Looks like something has changed recently, and oversubscription settings in $HOME/.openmpi/mca-params.conf are being ignored. Was this change intentional or is it a regression?

@dalcinl
Copy link
Contributor Author

dalcinl commented Oct 13, 2024

@hppritcha Maybe this issue comes from #12853?

@hppritcha
Copy link
Member

Highly likely that's the case.

@hppritcha hppritcha pinned this issue Oct 13, 2024
@hppritcha hppritcha self-assigned this Oct 13, 2024
@rhc54
Copy link
Contributor

rhc54 commented Oct 13, 2024

Unsure exactly what was included in that update, but I was immediately suspicious when that PR was filed because it did not include the corresponding update to the PMIx submodule. As previously noted, the two are tightly coupled - so you should never update one without updating the other.

Might not be the cause here, but it is the first thing I would try.

@dalcinl
Copy link
Contributor Author

dalcinl commented Oct 14, 2024

@hppritcha Please let me know once you open a PR. I would like to suggest to update the configuration of of oversubscription in this repo, such that it is in sync with what I do in mpi4py. I assume that OMPI wants all configuration in $HOME/.openmpi/mca-params.conf honored, including that of PRRTE. Am I right?

@hppritcha
Copy link
Member

Hmm. well actually this is not a bug. See https://docs.open-mpi.org/en/v5.0.x/mca.html#mca-parameter-changes-between-open-mpi-4-x-and-5-x . You'll need to add rmaps_default_mapping_policy = :oversubscribe to your $HOME/.prte/mca-params.conf, at least for v5.0.x and current main.

The ompi mpi4py github action is kind of misleading here so i'll fix it to not use that deprecated mca param in the $HOME/.openmpi/mca-params.conf file.

@rhc54
Copy link
Contributor

rhc54 commented Oct 15, 2024

@hppritcha I suppose this opens the question I noted on your PR. We could register synonyms for the OMPI v4 form of the PRRTE params. Do we want to do so? Be a little bit of grunge work, but nothing all that bad. May not be possible for all params as they may not have a corresponding OMPI v5 version - or the values may conflict so that PRRTE doesn't understand them. But we could pick up a number of them, and certainly the "oversubscribe" one should be doable.

I've added a check to ensure we are actually processing the OMPI default param files - pretty sure we are, but I'll check. Just need to finish something else first.

@dalcinl
Copy link
Contributor Author

dalcinl commented Oct 16, 2024

So the OMPI documentation @hppritcha pointed to warns use to use THREE different locations to configure Open MPI operation. Are you guys sure you want to put that complexity on users?

I must add a clarification: the purpose of the line setting rmaps_base_oversubscribe = true is just to be able to test against branch v4.0.x, I know that setting is gone in v5.0.x and main, and that's the reason I also set maps_default_mapping_policy = :oversubscribe, but all that in the single config file $HOME/.openmpi/mca-params.conf.

Until a few days ago, oversubscription seemed to work with all branches, and as of today it is still working in v5.0.x. From what @rhc54 said, I understand OMPI from v5.0.x and main simply ignore the old v4.x setting (is that correct?).
Therefore it looks to me that v5.0.x is indeed reading the oversubscription setting from $HOME/.openmpi/mca-params.conf (otherwise my tests would fail), while the main branch lost such ability. Is my diagnosis correct?

@dalcinl
Copy link
Contributor Author

dalcinl commented Oct 16, 2024

Guys, I have a somewhat explicit question. Are PRRTE and PMIx (when used via/with Open MPI) supposed to read configuration from OMPI's config file $HOME/.openmpi/mca-params.conf? If that is the case, then this issue looks like a regression. If that is not the case, then I do not understand why mpi4py CI is not failing with [email protected] due to oversubscription not being turned on (as mpi4py IS NOT using $HOME/.prte/mca-params.conf`).

@rhc54
Copy link
Contributor

rhc54 commented Oct 16, 2024

I have said several times that I will investigate it - just not a high priority at the moment when compared with other things.

@hppritcha
Copy link
Member

I see @dalcinl . i'll take a further look at what may have caused this change in behavior on our main branch.

@rhc54
Copy link
Contributor

rhc54 commented Oct 16, 2024

So here is the problem. The change came about because of an OMPI reported issue regarding passing of MCA params. We got that fixed. However, what it exposed was that we were overlaying processing of OMPI, PRRTE, and PMIx params on top of each other, thereby causing confusion.

Long story short: the issue here is that mpirun doesn't know it is working with an OMPI app until after we have already processed the PRRTE and PMIx params, and after we have already initialized much of PRRTE (opened frameworks, etc). So it is far too late in the process to impact PRRTE settings.

This is why we have three param files - to ensure that the settings are applied before anything happens in their respective projects. I personally don't see a simple way around that one - I suppose you could add param file parsing in OMPI's mpirun code before it exec's prterun, though that might get a little klunky as you'd have to track mapping of PRRTE and PMIx param names (e.g., to recognize that "rmaps_base_default_mapping_policy" requires conversion to "PRTE_MCA_rmaps_base_default_mapping_policy).

@bosilca
Copy link
Member

bosilca commented Oct 16, 2024

From a user perspective this is mind blowingly complicated and cumbersome. And I'm trying to be polite.

@rhc54
Copy link
Contributor

rhc54 commented Oct 16, 2024

I actually agree - just waiting for someone to propose a realistic solution. Frankly, the whole MCA param thing has been a considerable source of confusion to users over the years - way too many params, far too detailed knowledge of the code base required to really use them. Adding knowledge of which ones to set where does increase that burden.

To recap the problem a bit, in case someone does want to come up with a proposal, imagine the case where someone puts a PRRTE-related value in the OMPI param file, and still has that same param in the PRRTE param file. What precedence should be applied to determining the final value? Which one "wins"?

We could collapse to a single param file, passing its name in the environment to override the project default (e.g., modify OMPI's "mpirun" to pass the name/path to the OMPI param file) - but we'd have to use that name to override looking for any of the other files. Not sure how confusing that will get. If I put something in a PMIx param file, then suddenly that setting disappears because I ran OMPI's "mpirun"?

So I'm open to thoughts - just unsure we really have a "right" answer here.

@hppritcha
Copy link
Member

open-mpi/prrte#17

@rhc54
Copy link
Contributor

rhc54 commented Oct 16, 2024

Afraid that won't work @hppritcha - PRRTE uses the PMIx MCA base. You would have to convert PRRTE to use the OPAL MCA base, which means pulling PRRTE completely inside OMPI ala ORTE...which forces a near-complete rewrite of PRRTE (e.g., every framework would need redo, you'd have to convert to the OPAL object system, etc.). At that point, you might as well go back to ORTE and start from there.

@hppritcha
Copy link
Member

git bisect and code inspection shows that the behavior was changed when we merged the commit from this PR openpmix/prrte#1999 into our fork of prrte.

Not sure whether to consider this a bug or a feature at this point. I guess at a minimum we should document the need to put prrte mca params in the prte/mca-params.conf when not using env. variables or mpirun command line arguments.

@rhc54
Copy link
Contributor

rhc54 commented Oct 16, 2024

Yeah, the problem lies in the required conversion - need to identify that a param actually belongs to PRRTE so you can correctly prefix it with PRTE_MCA_ so PRRTE will pick it up. In this case, you need to read the OMPI param file(s) and perform the translation on anything found there.

However, PRRTE doesn't know we are using the OMPI personality, and therefore doesn't know we should read the OMPI param file(s), until after we have populated the MCA system. I can investigate perhaps doing another parse at time of selecting the ompi/schizo component - might be in time for some things, but not early enough for some params to have already taken effect. Have to see how the MCA system would deal with duplicate parsing.

@hppritcha
Copy link
Member

Hmm... well our prte could be made to know its doing ompi personality.

@rhc54
Copy link
Contributor

rhc54 commented Oct 16, 2024

I think you'll find things are more complicated than you realize if/when you try! However, you may find it a valuable learning experience.

I have fixed this issue in the referenced PR. As noted in the commit comment, it isn't a perfect solution - however, we do identify the OMPI personality fairly early in the init process, and so most (maybe the vast majority?) of params of interest to users will be caught by it.

Per the commit message:

This still raises precedence questions. For
now, what this will do is have values in the
OMPI param files give way to corresponding
values specified in PRRTE and PMIx param
files. In other words, values in PMIx and
PRRTE param files are written into the
environment, not overwriting any previously
existing envar. OMPI param files are then
read, and any params that correspond to
PRRTE and PMIx params are written into
the environment - without overwriting
any that already exist.

So if you have a param specified in a
PRRTE param file, and you also specify it
in the OMPI param file, the PRRTE param file
value will "win".

User beware - someone is bound to be very
unhappy.

This really needs to be documented as someone is bound to spend oodles of hours trying to understand why the value they specified somewhere isn't working. You should also note that we still have lost all info as to the source of a param's value - still no idea if/how one might recover that info.

hppritcha added a commit to hppritcha/prrte that referenced this issue Oct 16, 2024
we ignore pmix mca params in the file now as that's
being treated as a separate piece of software that we
use as clients.

related to open-mpi/ompi#12857

Signed-off-by: Howard Pritchard <[email protected]>
@bosilca
Copy link
Member

bosilca commented Oct 17, 2024

It seems to me that all these issues arise from the fact that we persist at separating the MCA for the different layers. This also forces us to prepend all MCA with a repeated text pattern (in @rhc54 PR that's PRTE_MCA_* for an average of 12 bytes per env), wasting space on the 32k environment limit.

I propose we do the opposite, MCA files are read once (by the different projects in the context of mpirun or batch plugin) and appended, or prepended, to a single environment variable. No wasted space for tagging the project, compact and clean, easy to print and all MCA-based frameworks have the same. The downside is a replication of unnecessary params in processes and projects that don't need them, but as long as there is no name collision (and we know that's not the case) the wasted space will be ridiculously small compared with our current memory footprint for all the MCA variables.

@rhc54
Copy link
Contributor

rhc54 commented Oct 17, 2024

Didn't realize I had the power to close an issue over here with a commit in upstream PRRTE, so I'll reopen it.

I have no immediate objection to your proposal. My only initial concerns would be:

  1. with no prefix, how do we avoid collisions with envars from other projects or set by users? In other words, if we create an envar that looks like "btl_tcp_foo:oob_base_bar" - what prevents us from colliding with someone's random variable? Or maybe we just prefix the entire string with a single "XYZ_MCA:"? <EDITED> Scratch this one - we obviously just give the param the name of "XYZ_MCA" and the value is just the entire string. My bad. <EDITED>
  2. still begs the problem of precedence as I parse this new param - when I find multiple occurrences of "rmaps_base_foo", which one do I use?
  3. in looking at the environment, I only generally see about 10-15 project-related envars. So even though we have a lot of potential entries, we realistically only are using about 150 bytes for envar prefixes.

@rhc54 rhc54 reopened this Oct 17, 2024
@janjust janjust unpinned this issue Oct 28, 2024
@janjust janjust pinned this issue Oct 28, 2024
@hppritcha hppritcha unpinned this issue Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants