finder_id problem for MPI AHF runs #87

mtremmel · 2018-12-05T16:11:30Z

the finder_id is meant to map to the original order of the halo catalog, not the unique ID number assigned by the halo finder necessarily. For MPI runs (e.g. with AHF but also similar for others) the halo ID is a very large number not associated with its final position within the halo catalog (i.e. there is no 1, but rather very large identifying numbers). In pynbody, the halo catalog is read in as a list of halos and halos are extracted based on their position within that list (h.load_copy(5) will load in the 5th halo, for example). A while ago, the definition of finder_id was changed such that it was associated directly with the ID number given in the catalog. This works fine for normal AHF where the order = ID number. However, this will create problems for catalogs created with MPI, where this is not the case.

The text was updated successfully, but these errors were encountered:

mtremmel · 2018-12-05T16:16:53Z

the problem I think comes from this commit
7a4d78d

mtremmel · 2018-12-05T16:25:14Z

I think the best way forward with this is to change the output from the halo finder specific input handler such that finder_id maps to the catalog position rather than the ID number. This allows for, I think, easier fixes/changes to be made to other catalogs

apontzen · 2018-12-05T16:31:26Z

The trouble is then we have three layers of IDs - the halo_number, the finder_id, and this new “sometimes the finder_id and sometimes something else” thing... is there a cleaner solution?

mtremmel · 2018-12-05T16:34:02Z

sorry I wasn't clear... my proposal is to keep just those two, but have the AHF input_handler output the "correct" finder_id rather than the one directly taken from the stat file. Because this is how the AHF catalog works in pynbody. finder_id must map to the halo such that h.load_copy(finder_id) returns the desired halo. right now this would not be the case for MPI AHF.

mtremmel · 2018-12-05T16:35:57Z

the problem that I fore see is that the "correct" finder_id might depend on the handler you use (pynbody vs yt, etc)

apontzen · 2018-12-05T16:35:58Z

I guess I am not following what is the ‘correct’ finder_id if not the one actually listed in the AHF stat file? It seems more like a bug in AHF than in tangos...

mtremmel · 2018-12-05T16:41:55Z

Note that this dependence I think already exists given the "id_offset" defined for AHF catalogs... that is clearly (I think?) specific to pynbody (i.e. ID 0 -> 1 and so on). presumably there could exist an input handler for particle data that used the original IDs rather than ID+1.

mtremmel · 2018-12-05T16:51:02Z

For halo finders run with MPI the ID of the halo is disconnected from the position within the file, because the latter is not known a priori. Each separate process writes out their halo information to the file separately so the final ordering really is just which process writes out the information first. In pynbody catalogs at least, for AHF, the order within the catalog is what matters. h.load_copy(N) grabs the Nth entry in the list of halos, not halo with ID number N+1, though this is what happens in the "normal" (non-MPI) AHF output.

mtremmel · 2018-12-05T16:52:52Z

I suppose the question is whether we should change the AHF reader to handle the raw ID numbers or change tangos to just always use the raw order within the catalog.

apontzen · 2018-12-05T16:54:17Z

Argh! It’s a horrible thing.

Basically my preferred solution would be to fix AHF so it post-processes and renumbers its halos to something a bit more sensible. But let’s assume that’s not going to happen in which case yes I agree the finder_id should just be defined to be the thing you pass to your handler... in which case for pynbody handlers it would be whatever pynbody expects.

What a mess!

Are you able to make a PR that implements this change?

mtremmel · 2018-12-05T16:57:04Z

give me some time but yes, I can come up with a fix. My goal will be to constrain the fix to the AHF handler.... I suppose this is the challenge of trying to be useful to everyone!

mtremmel · 2018-12-05T16:58:26Z

The way to think about it I suppose is that we are defining the AHF catalog ID as the order within the AHF_halos file. Other people will need to implement their input handlers to compensate for that I suppose. I don't think that's a very big ask IMO (especially given that this is how it normally is!)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

finder_id problem for MPI AHF runs #87

finder_id problem for MPI AHF runs #87

mtremmel commented Dec 5, 2018

mtremmel commented Dec 5, 2018

mtremmel commented Dec 5, 2018

apontzen commented Dec 5, 2018

mtremmel commented Dec 5, 2018

mtremmel commented Dec 5, 2018

apontzen commented Dec 5, 2018

mtremmel commented Dec 5, 2018

mtremmel commented Dec 5, 2018

mtremmel commented Dec 5, 2018

apontzen commented Dec 5, 2018

mtremmel commented Dec 5, 2018

mtremmel commented Dec 5, 2018 •

edited

Loading

finder_id problem for MPI AHF runs #87

finder_id problem for MPI AHF runs #87

Comments

mtremmel commented Dec 5, 2018

mtremmel commented Dec 5, 2018

mtremmel commented Dec 5, 2018

apontzen commented Dec 5, 2018

mtremmel commented Dec 5, 2018

mtremmel commented Dec 5, 2018

apontzen commented Dec 5, 2018

mtremmel commented Dec 5, 2018

mtremmel commented Dec 5, 2018

mtremmel commented Dec 5, 2018

apontzen commented Dec 5, 2018

mtremmel commented Dec 5, 2018

mtremmel commented Dec 5, 2018 • edited Loading

mtremmel commented Dec 5, 2018 •

edited

Loading