-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
finder_id problem for MPI AHF runs #87
Comments
the problem I think comes from this commit |
I think the best way forward with this is to change the output from the halo finder specific input handler such that finder_id maps to the catalog position rather than the ID number. This allows for, I think, easier fixes/changes to be made to other catalogs |
The trouble is then we have three layers of IDs - the halo_number, the finder_id, and this new “sometimes the finder_id and sometimes something else” thing... is there a cleaner solution? |
sorry I wasn't clear... my proposal is to keep just those two, but have the AHF input_handler output the "correct" finder_id rather than the one directly taken from the stat file. Because this is how the AHF catalog works in pynbody. finder_id must map to the halo such that |
the problem that I fore see is that the "correct" finder_id might depend on the handler you use (pynbody vs yt, etc) |
I guess I am not following what is the ‘correct’ finder_id if not the one actually listed in the AHF stat file? It seems more like a bug in AHF than in tangos... |
Note that this dependence I think already exists given the "id_offset" defined for AHF catalogs... that is clearly (I think?) specific to pynbody (i.e. ID 0 -> 1 and so on). presumably there could exist an input handler for particle data that used the original IDs rather than ID+1. |
For halo finders run with MPI the ID of the halo is disconnected from the position within the file, because the latter is not known a priori. Each separate process writes out their halo information to the file separately so the final ordering really is just which process writes out the information first. In pynbody catalogs at least, for AHF, the order within the catalog is what matters. |
I suppose the question is whether we should change the AHF reader to handle the raw ID numbers or change tangos to just always use the raw order within the catalog. |
Argh! It’s a horrible thing. Basically my preferred solution would be to fix AHF so it post-processes and renumbers its halos to something a bit more sensible. But let’s assume that’s not going to happen in which case yes I agree the What a mess! Are you able to make a PR that implements this change? |
give me some time but yes, I can come up with a fix. My goal will be to constrain the fix to the AHF handler.... I suppose this is the challenge of trying to be useful to everyone! |
The way to think about it I suppose is that we are defining the AHF catalog ID as the order within the AHF_halos file. Other people will need to implement their input handlers to compensate for that I suppose. I don't think that's a very big ask IMO (especially given that this is how it normally is!) |
the finder_id is meant to map to the original order of the halo catalog, not the unique ID number assigned by the halo finder necessarily. For MPI runs (e.g. with AHF but also similar for others) the halo ID is a very large number not associated with its final position within the halo catalog (i.e. there is no 1, but rather very large identifying numbers). In pynbody, the halo catalog is read in as a list of halos and halos are extracted based on their position within that list (
h.load_copy(5)
will load in the 5th halo, for example). A while ago, the definition of finder_id was changed such that it was associated directly with the ID number given in the catalog. This works fine for normal AHF where the order = ID number. However, this will create problems for catalogs created with MPI, where this is not the case.The text was updated successfully, but these errors were encountered: