Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for AHF/Ramses importing #178

Merged
merged 15 commits into from
Apr 23, 2022
Merged

Conversation

cphyc
Copy link
Contributor

@cphyc cphyc commented Mar 24, 2022

This should follow pynbody/pynbody#661 to add support for AHF/RAMSES in tangos.

I have tested that

  • add works,
  • import-properties works as well.
    Note that it doesn't seem to be importing properly parent/child relations (as read from the substructure file).

map_child_parent = self._get_map_child_subhalos(ts_extension)

for halo in h:
# Tangos expects data to have a finder offset, and a finder id following the stat file logic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AHF 'offsets' and 'ids' can be different -- this is actually why this is here. I think it only becomes a problem when AHF is run with MPI. @mtremmel almost certainly has a comment on this...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Is this the difference between ID and halo_id in AHF properties? These two quantities are present in the catalog and I'm not entirely sure what they map onto

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly. I again ping @mtremmel to see what insight he has, as he first discovered this distinction in AHF and spent ages worrying about how to deal with it :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi sorry for the late response on this. Yes these two may be different. finder_offset encodes the position of the halo within the catalog, which is what pynbody uses directly. For example, load_copy(10) looks for the 10th halo in the catalog and loads it. It used to be always the case that the Nth halo ID in AHF files was always N - 1 (i.e. the 10th halo would be ID number 9). However, many halo finders, AHF included, have at least the option to produce completely random unique ID numbers for their halos that are completely disconnected from their position within the catalog. So, now we separate these out. Finder_ID reads the ID number directly from the catalog while finder_offset encodes the order within the catalog. Pynbody uses the latter, but we wanted to remain agnostic that other codes may use either identification number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I interpret correctly what you are doing here, you are explicitly assuming that both finder_offset and finder_id are the same and equal to the ID number in the catalog. This is probably true if you didn't run with MPI as @apontzen mentioned, but it doesn't have to be true (this is actually a parameter you can set in AHF when you run it with or without MPI).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok @cphyc @apontzen @mtremmel, I added a more detailed comment of the situation, which I am still not fully cleared up on (turns out the code didn't need to change as far as I understand).

I don't think this is particularly ideal, but the PropertyImporter needs fixing in more general way, and it has been modified in a way that is likely to break other handlers. I think this is beyond the scope of this PR, and that the best way is to open an issue from this discussion and fix in a different PR. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually sure what the problem is at the moment so I can't really comment. I just noted the inconsistency. But in particular I'm not sure what is wrong in PropertyImporter currently? Are you saying it's currently broken in the main branch in some way?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not broken per se, mostly inconsistent in an undocumented way? The expected behaviour of the enumeration has been changed by PR #88, from receiving one number to receiving two numbers before the actual properties to be committed to the database (commit ea7c175).

This might be ok for workflows built upon stat files, but for handlers enumerating properties through pynbody, I am unclear what the second property is meant to be.

Furthermore, I think earlier pynbody handlers were not updated to reflect the change in behaviour, by inserting a dummy second number before yield. For example, the GadgetSubfindHandler look like it would be missing one property following the enumeration here:

def iterate_object_properties_for_timestep(self, ts_extension, object_typetag, property_names):
h = self._construct_halo_cat(ts_extension, object_typetag)
if object_typetag=='halo':
pynbody_prefix = 'sub_'
elif object_typetag=='group':
pynbody_prefix = ""
else:
raise ValueError("Unknown object typetag %r"%object_typetag)
if 'child' in property_names and object_typetag=='group':
child_map = self._get_group_children(ts_extension)
for i in range(len(h)):
all_data = [i]
for k in property_names:
pynbody_properties = h.get_halo_properties(i,with_unit=False)
if pynbody_prefix+k in pynbody_properties:
data = self._resolve_units(pynbody_properties[pynbody_prefix+k])
if k == 'parent' and data is not None:
# turn into a link
data = proxy_object.IncompleteProxyObjectFromFinderId(data, 'group')
elif k=='child' and object_typetag=='group':
# subfind does not actually store a list of children; we infer it from the parent
# data in the halo catalogue
data = child_map.get(i,None)
if data is not None:
data = [proxy_object.IncompleteProxyObjectFromFinderId(data_i, 'halo') for data_i in data]
else:
data = None
all_data.append(data)
yield all_data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the current code was basing this on AHF output, not on pynbody's post processed halo catalog outputs which is where it seems like the problem is arising?

Copy link

@Martin-Rey Martin-Rey Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks. I have updated the comment to be explicit about the behaviour, and the documentation of the base class to reflect the new expectations.

I will open a separate issue to flag that (I think) old pynbody handlers might have been broken by the change and need a one-line update.

After this, I think we are good to go, but let me know what you think.

@Martin-Rey
Copy link

Hi @cphyc @apontzen, a tangential issue to this PR, but the tests are systematically failing in the mysql build for all python versions with

.EEEEEEEEEEEEEE.EEE
[74](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:74)
======================================================================
[75](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:75)
ERROR: test suite for <module 'test_ahf_trees' from '/home/runner/work/tangos/tangos/tests/test_ahf_trees.py'>
[76](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:76)
----------------------------------------------------------------------
[1559](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1559)
Traceback (most recent call last):
[1560](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1560)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/nose/suite.py", line 210, in run
[1561](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1561)
    self.setUp()
[1562](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1562)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/nose/suite.py", line 293, in setUp
[1563](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1563)
    self.setupContext(ancestor)
[1564](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1564)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/nose/suite.py", line 316, in setupContext
[1565](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1565)
    try_run(context, names)
[1566](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1566)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/nose/util.py", line 471, in try_run
[1567](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1567)
    return func()
[1568](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1568)
  File "/home/runner/work/tangos/tangos/tests/test_merger_tree.py", line 16, in setup
[1569](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1569)
    testing.init_blank_db_for_testing()
[1570](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1570)
  File "/home/runner/work/tangos/tangos/tangos/testing/__init__.py", line 176, in init_blank_db_for_testing
[1571](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1571)
    with engine.connect() as conn:
[1572](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1572)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3234, in connect
[1573](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1573)
    return self._connection_cls(self, close_with_result=close_with_result)
[1574](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1574)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 96, in __init__
[1575](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1575)
    else engine.raw_connection()
[1576](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1576)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3313, in raw_connection
[1577](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1577)
    return self._wrap_pool_connect(self.pool.connect, _connection)
[1578](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1578)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 3280, in _wrap_pool_connect
[1579](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1579)
    return fn()
[1580](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1580)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 310, in connect
[1581](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1581)
    return _ConnectionFairy._checkout(self)
[1582](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1582)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 868, in _checkout
[1583](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1583)
    fairy = _ConnectionRecord.checkout(pool)
[1584](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1584)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 476, in checkout
[1585](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1585)
    rec = pool._do_get()
[1586](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1586)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 146, in _do_get
[1587](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1587)
    self._dec_overflow()
[1588](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1588)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 72, in __exit__
[1589](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1589)
    with_traceback=exc_tb,
[1590](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1590)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
[1591](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1591)
    raise exception
[1592](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1592)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/pool/impl.py", line 143, in _do_get
[1593](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1593)
    return self._create_connection()
[1594](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1594)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 256, in _create_connection
[1595](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1595)
    return _ConnectionRecord(self)
[1596](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1596)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 371, in __init__
[1597](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1597)
    self.__connect()
[1598](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1598)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 666, in __connect
[1599](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1599)
    pool.logger.debug("Error on connect(): %s", e)
[1600](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1600)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 72, in __exit__
[1601](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1601)
    with_traceback=exc_tb,
[1602](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1602)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
[1603](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1603)
    raise exception
[1604](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1604)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/pool/base.py", line 661, in __connect
[1605](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1605)
    self.dbapi_connection = connection = pool._invoke_creator(self)
[1606](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1606)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/engine/create.py", line 590, in connect
[1607](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1607)
    return dialect.connect(*cargs, **cparams)
[1608](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1608)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 597, in connect
[1609](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1609)
    return self.dbapi.connect(*cargs, **cparams)
[1610](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1610)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pymysql/connections.py", line 353, in __init__
[1611](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1611)
    self.connect()
[1612](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1612)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pymysql/connections.py", line 633, in connect
[1613](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1613)
    self._request_authentication()
[1614](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1614)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pymysql/connections.py", line 932, in _request_authentication
[1615](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1615)
    auth_packet = _auth.caching_sha2_password_auth(self, auth_packet)
[1616](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1616)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pymysql/_auth.py", line 265, in caching_sha2_password_auth
[1617](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1617)
    data = sha2_rsa_encrypt(conn.password, conn.salt, conn.server_public_key)
[1618](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1618)
  File "/opt/hostedtoolcache/Python/3.6.15/x64/lib/python3.6/site-packages/pymysql/_auth.py", line 144, in sha2_rsa_encrypt
[1619](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1619)
    "'cryptography' package is required for sha256_password or caching_sha2_password auth methods"
[1620](https://github.com/cphyc/tangos/runs/6020693936?check_suite_focus=true#step:10:1620)
RuntimeError: 'cryptography' package is required for sha256_password or caching_sha2_password auth methods 

I am not entirely sure whether this is because the branch is based against an older version of main and needs rebasing, whether it should be fixed here, or through a separate branch (I would expect the current set of tests on main to fail with the same compatibility error?)

@apontzen
Copy link
Member

This seems to be a breaking change in PyMySQL or a related package. It is fixed in the main branch and so I would recommend pulling main into here.

Copy link
Member

@apontzen apontzen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some minor comments.

all_possible_handlers = cls.__subclasses__()

# Add all subclasses and sub-subclasses
all_possible_handlers = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might cause confusion if people make a sub-sub-sub-class (not impossible).

I think it would be better to move this out into a utility function called get_all_subclasses or something, and make it recursive so that it really captures all subclasses.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it looks like the logic would be better isolated somewhere else, and made recursive. @cphyc could you attempt this? I haven't touched this part of the PR at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, all (sub-)*classes are being added, even though the code isn't recursive (it uses a stack of classes and subclasses until the stack is exhausted, adding to the stack at each iteration the current class' subclasses). I've updated the comment accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes. Would it still be a good idea to put it in a separate function though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apontzen done!

tangos/input_handlers/pynbody.py Outdated Show resolved Hide resolved
tangos/input_handlers/pynbody.py Outdated Show resolved Hide resolved
tangos/input_handlers/pynbody.py Outdated Show resolved Hide resolved
tangos/input_handlers/pynbody.py Outdated Show resolved Hide resolved
@apontzen apontzen merged commit 7458b5b into pynbody:master Apr 23, 2022
@cphyc cphyc deleted the support-AHF-ramses branch April 23, 2022 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants