-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for AHF/Ramses importing #178
Conversation
tangos/input_handlers/pynbody.py
Outdated
map_child_parent = self._get_map_child_subhalos(ts_extension) | ||
|
||
for halo in h: | ||
# Tangos expects data to have a finder offset, and a finder id following the stat file logic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AHF 'offsets' and 'ids' can be different -- this is actually why this is here. I think it only becomes a problem when AHF is run with MPI. @mtremmel almost certainly has a comment on this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Is this the difference between ID
and halo_id
in AHF properties? These two quantities are present in the catalog and I'm not entirely sure what they map onto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly. I again ping @mtremmel to see what insight he has, as he first discovered this distinction in AHF and spent ages worrying about how to deal with it :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi sorry for the late response on this. Yes these two may be different. finder_offset encodes the position of the halo within the catalog, which is what pynbody uses directly. For example, load_copy(10) looks for the 10th halo in the catalog and loads it. It used to be always the case that the Nth halo ID in AHF files was always N - 1 (i.e. the 10th halo would be ID number 9). However, many halo finders, AHF included, have at least the option to produce completely random unique ID numbers for their halos that are completely disconnected from their position within the catalog. So, now we separate these out. Finder_ID reads the ID number directly from the catalog while finder_offset encodes the order within the catalog. Pynbody uses the latter, but we wanted to remain agnostic that other codes may use either identification number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I interpret correctly what you are doing here, you are explicitly assuming that both finder_offset and finder_id are the same and equal to the ID number in the catalog. This is probably true if you didn't run with MPI as @apontzen mentioned, but it doesn't have to be true (this is actually a parameter you can set in AHF when you run it with or without MPI).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok @cphyc @apontzen @mtremmel, I added a more detailed comment of the situation, which I am still not fully cleared up on (turns out the code didn't need to change as far as I understand).
I don't think this is particularly ideal, but the PropertyImporter
needs fixing in more general way, and it has been modified in a way that is likely to break other handlers. I think this is beyond the scope of this PR, and that the best way is to open an issue from this discussion and fix in a different PR. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually sure what the problem is at the moment so I can't really comment. I just noted the inconsistency. But in particular I'm not sure what is wrong in PropertyImporter
currently? Are you saying it's currently broken in the main branch in some way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not broken per se, mostly inconsistent in an undocumented way? The expected behaviour of the enumeration has been changed by PR #88, from receiving one number to receiving two numbers before the actual properties to be committed to the database (commit ea7c175).
This might be ok for workflows built upon stat files, but for handlers enumerating properties through pynbody, I am unclear what the second property is meant to be.
Furthermore, I think earlier pynbody handlers were not updated to reflect the change in behaviour, by inserting a dummy second number before yield
. For example, the GadgetSubfindHandler
look like it would be missing one property following the enumeration here:
tangos/tangos/input_handlers/pynbody.py
Lines 392 to 424 in b828357
def iterate_object_properties_for_timestep(self, ts_extension, object_typetag, property_names): | |
h = self._construct_halo_cat(ts_extension, object_typetag) | |
if object_typetag=='halo': | |
pynbody_prefix = 'sub_' | |
elif object_typetag=='group': | |
pynbody_prefix = "" | |
else: | |
raise ValueError("Unknown object typetag %r"%object_typetag) | |
if 'child' in property_names and object_typetag=='group': | |
child_map = self._get_group_children(ts_extension) | |
for i in range(len(h)): | |
all_data = [i] | |
for k in property_names: | |
pynbody_properties = h.get_halo_properties(i,with_unit=False) | |
if pynbody_prefix+k in pynbody_properties: | |
data = self._resolve_units(pynbody_properties[pynbody_prefix+k]) | |
if k == 'parent' and data is not None: | |
# turn into a link | |
data = proxy_object.IncompleteProxyObjectFromFinderId(data, 'group') | |
elif k=='child' and object_typetag=='group': | |
# subfind does not actually store a list of children; we infer it from the parent | |
# data in the halo catalogue | |
data = child_map.get(i,None) | |
if data is not None: | |
data = [proxy_object.IncompleteProxyObjectFromFinderId(data_i, 'halo') for data_i in data] | |
else: | |
data = None | |
all_data.append(data) | |
yield all_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the current code was basing this on AHF output, not on pynbody's post processed halo catalog outputs which is where it seems like the problem is arising?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks. I have updated the comment to be explicit about the behaviour, and the documentation of the base class to reflect the new expectations.
I will open a separate issue to flag that (I think) old pynbody handlers might have been broken by the change and need a one-line update.
After this, I think we are good to go, but let me know what you think.
Hi @cphyc @apontzen, a tangential issue to this PR, but the tests are systematically failing in the
I am not entirely sure whether this is because the branch is based against an older version of |
This seems to be a breaking change in PyMySQL or a related package. It is fixed in the main branch and so I would recommend pulling main into here. |
…d failure on attempt to bridging AMR cells
… existing live calculations
… for main halo leading to no information being stored about parenthood in the database
fab552d
to
dd6b71e
Compare
0277954
to
226dba7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Some minor comments.
tangos/input_handlers/finding.py
Outdated
all_possible_handlers = cls.__subclasses__() | ||
|
||
# Add all subclasses and sub-subclasses | ||
all_possible_handlers = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might cause confusion if people make a sub-sub-sub-class (not impossible).
I think it would be better to move this out into a utility function called get_all_subclasses
or something, and make it recursive so that it really captures all subclasses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it looks like the logic would be better isolated somewhere else, and made recursive. @cphyc could you attempt this? I haven't touched this part of the PR at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, all (sub-)*classes are being added, even though the code isn't recursive (it uses a stack of classes and subclasses until the stack is exhausted, adding to the stack at each iteration the current class' subclasses). I've updated the comment accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes. Would it still be a good idea to put it in a separate function though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apontzen done!
54f4150
to
5a9863f
Compare
This should follow pynbody/pynbody#661 to add support for AHF/RAMSES in tangos.
I have tested that
add
works,import-properties
works as well.Note that it doesn't seem to be importing properly parent/child relations (as read from the
substructure
file).