Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AHF halo ID fix #88

Merged
merged 23 commits into from
Mar 20, 2021
Merged

AHF halo ID fix #88

merged 23 commits into from
Mar 20, 2021

Conversation

mtremmel
Copy link
Contributor

@mtremmel mtremmel commented Dec 6, 2018

change the AHF input handler to generate halo IDs based on the position within the file, making all versions of AHF consistent with one another. This is needed because AHF with MPI generates random ID numbers for all halos, which is currently incompatible with, e.g., pynbody's AHF catalog reader. This was done by changing the iter_rows_raw method. Further, deleted unnecessary (and unreachable) line in the filename method for AHFStatFile.

… the ID rather than the raw file ID to be consistent in meaning between MPI and non-MPI AHF versions. Also, remove spurious extra line in filename method.
@apontzen
Copy link
Member

apontzen commented Dec 6, 2018

This is problematic because it copy/pastes from HaloStatFile, so duplicating functionality

Would it be possible to reimplement within the HaloStatFile base class, with the AHF child class just setting a flag to activate the new behaviour?

@mtremmel
Copy link
Contributor Author

mtremmel commented Dec 7, 2018

makes sense. I'll do that

…thin the file rather than the value in the ID column of the file. Now each handler defines a flad when this needs to occur
@apontzen
Copy link
Member

apontzen commented Dec 7, 2018

Beautiful! Could we just add a test?

@apontzen
Copy link
Member

Ping on adding a test so that we can merge this in

@apontzen
Copy link
Member

apontzen commented Apr 9, 2019

Looking good... I have a few questions

(1) Is there any way to disable this behaviour?
(2) Does the test really check for anything substantive? Given that the halo numbering is already sequential in the test file, it seems to me like it is not really checking your changes.
(3) Will this break e.g. @TobiBu's work flow and if so what do we do about it (see 1)
(4) On a minor point, please make the test a separate function rather than bundle it in with the IDL stat file test.

…m ones from MPI runs. Add a separate function to test this conversion specifically.
@mtremmel
Copy link
Contributor Author

mtremmel commented Apr 9, 2019

Ok I have fixed the test to make it a bit more separate and a bit more specific to the issues surrounding MPI outputs from AHF.

As far as turning it off... this is where I'm uncertain what the best course of action is. Currently (and really, from the beginning) it was assumed that each halo catalog could have their own format in terms of finder_id. For AHF the finder_id was originally always offset by 1 from the "raw" group ID (rather than starting at 0 it started at 1) and now it also specifies that the finder_id should be equal to the position within the catalog (still offset by 1). This makes the most sense when combined with how pynbody reads AHF (and it is consistent with the default AHF grp IDs when it isn't run with MPI). However, if we want to make TANGOS as adaptable as possible to any analysis tool, we could consider making all of these defaults able to be overridden by the user. I would think the best way to do that would be either

  1. a runtime argument when the simulation is first added that specifies how finder_id and halo_number translate to the raw group IDs in the halo catalog.

or

  1. something in the config file that can override the relevant class attributes (_id_offset and _id_from_file_pos)

thoughts? I'm not 100% sure how to do this at the moment but it seems doable. I think I lean toward option 2. The argument against this (I think) would be that translating the default tangos ordering to whatever tools the user wants as their output handlers should occur at the level of the output handlers themselves. If your tools read in AHF halos by their raw ID number, then your handler needs to translate the TANGOS finder_id to what is needed. I'm really not sure what the right answer is here.

@mtremmel
Copy link
Contributor Author

mtremmel commented Apr 9, 2019

as far as whether this will affect current work on merger trees, I looked at the test files and they seem to be approximating AHF files run without MPI. If this is the case, then the finder_id and halo_numbers generated by tangos should be the same as they were before. That is because the non-mpi AHF already makes grp numbers equal to their ordering by n_particles (and TANGOS already assumes that AHF ID numbers are shifted by 1 relative to their raw values)

@TobiBu
Copy link

TobiBu commented Apr 9, 2019

Looking good... I have a few questions

(1) Is there any way to disable this behaviour?
(2) Does the test really check for anything substantive? Given that the halo numbering is already sequential in the test file, it seems to me like it is not really checking your changes.
(3) Will this break e.g. @TobiBu's work flow and if so what do we do about it (see 1)

I think this should work for me. During my implementation @apontzen pointed me already to this pull request and I integrated the new AHF IDs in my testing branch. However, I can pull this particular branch and check if something breaks.

@TobiBu
Copy link

TobiBu commented Apr 9, 2019

Although, I am not sure how an AHF mtree file would look like if the halo catalogue was run with mpi.
My workflow relies on the finder_id. How is that impacted by the changes?

cheers
Tobias

@mtremmel
Copy link
Contributor Author

mtremmel commented Apr 9, 2019

So the changes made here would mean that if you were to initialize a tangos database with output from MPI, the finder_id would correspond to the position of the halo within the catalog (1 = first, 2 = second, etc) while halo_number would be the order of the halo in terms of particle number in the simulation (1 = most particles, 2 = second most, etc). The difference with MPI is that neither the halo catalog nor the ID numbers are ordered by particle number. The grp numbers in the catalog itself are large randomly generated unique numbers. TANGOS essentially ignores these numbers right now since they are meaningless. A normal halo catalog run with pynbody can be examined using either number (finder_id by default and halo_number if you run s.halos(dosort=True))

@apontzen
Copy link
Member

apontzen commented Apr 12, 2019

So to be clear, there are basically three numbers

  • The halo_number, which is a tangos construction which we can ignore in this discussion;
  • The finder_id, which was supposed to be the identity of the halo in the raw halo finder output;
  • The offset of the halo within finder catalogues (which we could notionally call offset_id). This is stored as finder_id in @mtremmel's patch

My guess is that the AHF mtrees will use a true finder_id rather than an offset_id, and therefore this patch could break importing mtrees. @TobiBu can you confirm, and do either of you have a good way to proceed from here? Very frustrating...

Do I understand correctly by the way that pynbody expects an offset_id rather than a finder_id?

What a mess... I really don't want to start storing an offset_id and a finder_id, but am beginning to wonder whether it might be necessary.

@mtremmel
Copy link
Contributor Author

I think all we really need is that the Stat file input handler class information is saved at the database level when the information is initiated. So, as a user, I should be able to easily back out what the finder_id offset is. Similarly, whether halo_number was a re-ordering or not (though the former is more important at the moment). If this is true, then the user can easily convert finder_id and/or halo_number to whatever they need, right?

@TobiBu
Copy link

TobiBu commented Apr 15, 2019

I think that is the best option. In case of the AHF input handler it would be totally sufficient to either save the original AHF_id or specify how finder_idwould translate into the original AHF_id.

@mtremmel
Copy link
Contributor Author

out of curiosity tough, does your code use the actual AHF_id from the file or just the order within the file?

@TobiBu
Copy link

TobiBu commented Apr 15, 2019

It needs the actual AHF_id, since the merger trees of AHF are stored such that between consecutive outputs the halos of given AHF_id are connected. But now that I type, it might actually be that its simply the order within the file. I would need to check the source code of AHF_mtree. Or write a mail to Alexander Knebe.

@mtremmel
Copy link
Contributor Author

ah I see. So, as it is right now, for non-MPI runs, the AHF_id is directly related to the order within the file... it might be worth checking that AHF_id is still the important thing when it is run with MPI, but it makes sense that it would be I think.

Maybe the best solution to all of this is, in fact, to add one more property: the original halo_number (call it, say, orig_grp_number. This is easy enough to implement I believe. @apontzen what do you think? This way TANGOS will have all of the important "original" information (a value directly related to placement within file and the original group numbers assigned) as well as the "modified value" of halo_number which is related to the order by particle number by default. I think this would cover all the bases. Everyone should be happy then... so long as the offset value is accessible, the user will know the position within the halo catalog, the original grp number, and the modified grp number ordered by particle count. From these, any code that needs to interface with the database should be able to do so with ease, I think.

@TobiBu
Copy link

TobiBu commented Apr 15, 2019

Sounds good to me.

BTW, I checked the source code of the AHF merger tree code and it says somewhere when writing the _mtree_files. It says: // this is the case where we use the haloid as found in *_particles
Whatever that means for this discussion... I am happy as long as I can reconstruct the original AHF_id as written in the _halos file.

@mtremmel
Copy link
Contributor Author

Hey @apontzen just pinging this. Did we ever come to a consensus on this?

@apontzen
Copy link
Member

apontzen commented Jul 9, 2020

I think that changing the meaning of finder_id is dangerous, and having re-read the whole thread am still a bit unclear/confused what the main issues at play are. To confirm, is there actually a bug now, where MPI-AHF catalogues return the wrong halos because of confusion between AHF's notion of an ID versus just indexing the halos in sequence within its output file?

@mtremmel
Copy link
Contributor Author

Here is what I remember about this issue after looking back on my code changes and the current master branch. The problem comes up with AHF + pynbody specifically. Pynbody's AHF catalog object assumes that when you give it a halo number to extract, that number corresponds to the placement within the file (1 = first halo in the catalog, 2 = second, etc). This is the "finder_id" in tangos, which is the "raw" value taken directly from the stat file's first column (assumed to be the halo ID column) plus some offset value which is defined specifically for each halo catalog class. The assumption is that the ID in AHF is exactly the list index of the halo in the catalog (the first is ID = 0, second is 1, etc). The finder_id is then made to be this value + 1 (1, 2, 3, 4, etc). However, the new version of AHF has nonsense numbers for the IDs in no way connected with their position within the file.

The changes I've made here, ensures that a particular halo stat file class can determine if it wants the finder_id (the number passed to the input handler to extract halos) to represent the raw ID number from the stat file or the position of the halo in the catalog. As it is now, for AHF at least the default is for finder_id to represent the location of the halo in the catalog in order to smoothly interface with pynbody. However, the way I've tried to set it up has it such that each stat file class can determine individually how it wants finder_id to work with the _id_from_file_pos class attribute. If true, finder_id = 0, 1, 2, 3. If false, finder_id = raw data from stat file.

In contrast, the halo_number is always meant to represent the ranking of that halo in terms of particle number.

@apontzen
Copy link
Member

Yes, I think I get it. I am wondering whether a simpler and more transparent fix would be for pynbody to use the AHF-provided IDs instead of assuming that they correspond to particular positions in a file? This prevents the mind-boggling situation of having three separate IDs stored by tangos, with the user trying to figure out what on earth they all mean.

I haven't seen this happen - is it an AHF version issue or something to do with running AHF with MPI?

Thoughts welcome...

@mtremmel
Copy link
Contributor Author

I believe more recently that AHF has randomly generated IDs regardless of whether mpi is used but I could be wrong. I think either way this is difficult. My feeling is that if we want TANGOS to be easily adaptable to any analysis tool, it should be able to handle whatever that tool needs. There is nothing more fundamental about using halo ID numbers (which are themselves meaningless) versus the order in the catalog, in my opinion (at least, not for AHF). Therefore, it makes sense that individual analysis tools may select one or the other and TANGOS must be ok with that.

@mtremmel
Copy link
Contributor Author

mtremmel commented Jul 10, 2020

In this regard, I think the best option is to try to have everything, so that people can create their own input handlers for their specific software. So, that means having halo_number (ranking of the halo by npart), finder_id (raw ID taken directly from the catalog) and position_id (position of the halo in the catalog, i.e. 1, 2, 3, ...). With these three, one can then decide to translate to their analysis tool of choice. I think the "offset" values in the stat file class are more problematic. Rather, I'd prefer all stat file readers to generate the same ID values for halos and then require individual input_handlers to translate that as they need

@mtremmel
Copy link
Contributor Author

Ok let me put out there my idea for this @apontzen and if you think it is worthwhile I can try to code it up.

TANGOS should aim to be as modular as possible to allow different analysis scripts and halo finder combos to be used with the system. If a stat file reader is used, it should always provide the same info regardless of the handler class (i.e. AHF + pynbody should give you the same info as Subfind+Gadget). In order to work with all halo catalogs while also providing as much meaningful info to the user as possible, I think this info should be (in addition to NDM, etc)

  1. some raw output ID read from the stat file ("finder_id")

  2. the indexed position within the list of halos provided by the catalog (0, 1, 2, ...; the "catalog_index")

  3. the rank order of the halo (1,2,3 for most Npart, second most, etc; the "halo_number")

If the handler function load_object() then passes all three of these, each handler can individually determine how it uses them to load in a halo. I think then implementing a handler function that can take in these three numbers and tarnslate that into a halo catalog identifier usable by that specific handler+catalog combo would be easiest. This can be callef from load_object and it would be an easy function to override as a part of a new handler class.

#example of new function for AHF+pynbody handler
def _halo_identifier(halo_number, finder_id, catalog_index):
    return catalog_index+1

@mtremmel
Copy link
Contributor Author

yup! But if you are using AHF it is catalog_index+1

@apontzen
Copy link
Member

apontzen commented Jul 16, 2020

OK. What a giant mess (not entirely of our own making). I guess you sold me on 2 then. But hold off just a while because I am looking at addressing #117. Should fix that one first, otherwise it might become a nightmare to merge.

@mtremmel
Copy link
Contributor Author

wait did I sell you on 2 or 3?? I was trying for 2! :-)

@apontzen
Copy link
Member

2, yes! Gah

…the catalog of each halo. Include this in enumerate and iter_rows functions. Ensure that n_total is used to determine whether a halo satisfies min_halo_particles to ensure that ordering remains robust.
…catalog_id_offset

to these values. To remain consistent, children calculations will use only
the raw halo ID number with no offset applied
… ID numbers.

Fix the ID grp stat file reader to have catalog indices equal to the grp number
so that these values can be used to access halo catalog correctly via pynbody.
…nclude both finder_id and catalog_index as inputs
…utput_testing input handlers,

and Halo class functions
Fix tracker and phantom halo initialization to include a catalog_index input
Fix bh halo linking to now use catalog_index
update ahf trees test to use catalog_index
… statfile

-Update ahf_trees to use the raw finder_id values from tree data files
-Perform correct halo object initialization in manager.py
-If unset, make halo_number equal to the catalog index rather than finder_id
-Update crosslink to utilize catalog_index rather than finder_id
-Update property importer to expect correct number if outputs from stat file reader
-Update object cache to use catalog_index rather than finder_id
make sure that the adapted enumerate returns the expected default result structure
…r_id

with an additional argument in resolve

Include additional argument when dealing with timestep cache for Proxy Objects
…l particle

count (Gas + Stars + DM) rather than just dark matter
@mtremmel
Copy link
Contributor Author

mtremmel commented Mar 1, 2021

Ok so I finally got around to making these sweeping changes. The tests all seem to have passed ok! I've outlined in words the important bits of what has changed.

The basic overall idea is that halos have have three different identifiers:

finder_id = the raw ID given to the halo by the halo finder

catalog_index = the position of the halo within the catalog (this may have separate definitions based on the input handler being used. For pynbody AHF for example, this starts at 1 rather than 0)

halo_number = by default the rank of the halo in terms of the number of particles (1 = most number of particles, etc)

in general, the catalog_index is what is used to interact with the halo catalog readers in pynbody and yt. As an option, at the time of adding the simulation, the user can turn off the renumber parameter. In this case, the halo_number becomes equal to the catalog_index. For AmigaIDL halos, catalog_index = finder_id. In this case, it would be best to run with add_simulation renumber=False so that the halo_number is just made to equal catalog_index.

If one is not loading from a stat file but just with pynbody, then finder_id = catalog_index. I figured this was the best option that did not rely too much on what kind of catalog pynbody was reading. However, I am happy to fix this to use the actual properties stored in the halo catalog object (e.g. "#ID" for the finder_id in the case of AHF) but my concern is that this will just require a bunch of if statements for different catalogs that we probably want to avoid? I'm open to suggestions here.

This has also required changes to the nature of timestep caches. TimestepObjectCache now creates two different maps for catalog and finder numbers. By default, resolving this cache uses the catalog_index but there are instances where the finder_id is required (e.g. when loading in subhalo information from the stat file). An extra variable can be given in the resolve() function if one wants to specify "catalog" vs "finder".

One somewhat unrelated thing I've changed is that the min_halo_particles variable is used to compare the TOTAL number of particles rather than just the dark matter. However, I realize this may be more controversial than I think so I wanted to gauge what is best here. It just seems more natural from a user perspective that such a variable takes into account all halo particles. Further, I believe that AHF, etc uses the total number of particles when determining rank. In any case, I think I'd prefer a separate variable min_dm_particles or something for that.

Anyway, @apontzen, let me know what you think!

Copy link
Member

@apontzen apontzen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing to have got through all this! I have some comments though, sorry... would be great just to tidy things up a tiny bit before we merge. Also, you should probably merge master into this branch before I merge it back into master (to resolve the conflicts). -- sorry you already did that

@@ -120,12 +122,23 @@ def load(self, mode=None):
handler this can be None or 'partial' (in a normal session) and, when running inside an MPI session,
'server' or 'server-partial'. See https://pynbody.github.io/tangos/mpi.html.
"""
if self.finder_id is None:
halo_number = self.halo_number
if not hasattr(self, "finder_id"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has this line changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This like is basically still there, but my thought was to first check to see if the object even has finder_id and catalog_index columns to begin with for backwards compatibility. Maybe this is all overkill?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now simplified this. The previous checks were deprecated anyway I think

self.timestep = timestep
self.halo_number = int(halo_number)
self.finder_id = int(finder_id)
self.catalog_index = int(catalog_index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add comments to these definitions to explain what the three different things are?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is catalog_index the best name for it? Would finder_offset or something like that be more descriptive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I can put in comments. The rationale for catalog_index is that this is the value passed to the input handler's halo catalog reader to extract a halo. In other words, this is the index pointing to this halo within that catalog object. Note that it is not at all a constant offset necessarily from the finder_id. In fact, there is no reason why it is related to the finder_id at all. Rather, h[catalog_index] will return this halo from halo catalog h

Copy link
Member

@apontzen apontzen Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that yes... but somehow finder_offset sounds right to me, it's the offset in the finder output of this halo from the start? Whereas catalog_index draws attention to finder vs catalog - is that the right thing to contrast? As another alternative, could finder_id become catalog_id? (bad idea, breaks backward compatibility)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha I guess we might just have a difference in opinion here? to me your description of "offset" just sounds like an "index" (0 = start, 1 = start +1, etc). I'm ok going with whatever you think is the most descriptive. it shouldn't be very hard to change. Maybe I've been thinking about it too long and find finder_offset confusing.

if self.catalog_index is None:
catalog_index = finder_id
else:
catalog_index = self.catalog_index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is a bit baffling, is it possible to add a comment explaining what is going on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really just an attempt at backwards compatibility with a database that does not have these columns already. Maybe this is all too much and it would be better simply provide a tool for the user to update their database accordingly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think simple is better here. I believe sqlalchemy takes care of adding a column of NULLs (will appear as Nones) when opening an old database. Certainly, I don't think the attribute will ever be missing

@@ -11,7 +11,8 @@

class HaloStatFile(object):
"""Manages and reads a halo stat file of unspecified format."""
_id_offset = 0
_catalog_index_offset = 0
# _id_from_file_pos = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably just be deleted. The old version had a flag whether the finder_id was derived from the position within the catalog file, but now finder_id and catalog_index are both saved separately

@@ -56,22 +57,29 @@ def all_columns(self):
def iter_rows_raw(self, *args):
"""
Yield the halo ID and requested columns from each line of the stat file, without any emulation.

If _id_from_file_pos is True, the ID is not taken from the stat file, but rather from the order in which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify that _id_from_file_pos is a class attribute, not something passed to the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to update this description because this is inaccurate


def _calculate_children_if_required(self):
if not hasattr(self, "_children_map"):
self._calculate_children()

def _child_halo_entry(self, this_id_raw):
self._calculate_children_if_required()
this_id = this_id_raw + self._id_offset
children = self._children_map.get(this_id, [])
#this_id = this_id_raw + self._id_offset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove code if it's not needed

@@ -208,7 +214,7 @@ def filename(cls, timestep_filename):
return "CannotComputeRockstarFilename"

class AmigaIDLStatFile(HaloStatFile):
_id_offset = 0
# _id_offset = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

:param args: strings for the column names
:return: id, arg1, arg2, arg3 where ID is the halo ID and argN is the value of the Nth named column
"""
with open(self.filename) as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this copy-paste code appear here? Shouldn't it be inherited from the base class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code expects from the stat file enumerator output of the form catalog_index, finder_id, arg1, arg2, arg3, ... but for this particular stat file catalog_index (the number that the halo catalog associates with a given halo) is always equal to the finder_id. That's why there needs to be a different function here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the docstring incorrect in that case? (I think this is also true for the base class method above)

As for avoiding copy-paste, would something like the following work? (NB I didn't actually try it at all! Just speculating)

def iter_rows_raw(self, *args):
   for row in super(self).iter_rows_raw(*args):
       row[0] = row[1] # sequential catalog index not right in this case; overwrite to match finder id
       yield row

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah I think that's a good solution. I can code that up and see. I'll double check the docstrings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented this and tested (no "self" in the super call, but otherwise exactly as you have it) and it seems to work. I've pushed this over just now!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also double checked and the iter_rows_raw docstrings were accurate but iter_rows one was not. That has been updated.

@@ -56,7 +56,7 @@ def _generate_timestep_pairs(self):
raise NotImplementedError("No implementation found for generating the timestep pairs")

def get_halo_entry(self, ts, halo_number):
h = ts.halos.filter_by(finder_id=halo_number).first()
h = ts.halos.filter_by(halo_number=halo_number).first()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This certainly looks more sensible, but is there any chance it is breaking something? Looks like there was previously a bug, which perhaps was cancelled out by a bug elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, but when I checked (and I just checked again) this function is never called anywhere as far as I can tell

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest deleting the function in that case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I've just committed a version with this function deleted. it passes all the nosetests

self._initialise_cache()

def resolve(self, finder_id, typetag):
self._ensure_cache()
def resolve(self, identifier, typetag, type='catalog'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type shouldn't be used as a variable name (it is a reserved keyword).

Anyway... wonder whether it would be neater here to have two separate methods, maybe retrieve_from_finder_id and retrieve_from_catalog_index or something like that? The method name resolve is cryptic anyway in this context (my fault, I know) and now with this weird type parameter it is really hard to figure out what is going on.

to provide backwards compatibility but are now deprecated

remove unneeded commented-out code

re-write docstrings for iter_rows_raw functions to actually reflect the
nature of the outputs.
… that

identifies halos by either finder_id or catalog_index separately. Include this change
in both the proxy object classes and property importer where appropriate.
make the AmigaIDLStatFile iter_rows_raw function more streamlined
@mtremmel
Copy link
Contributor Author

hey just a ping @apontzen to have a look and let me know what you think about the variable names (finder_id, catalog_index and whether there are better, more intuitive options). As for the larger issue of backwards compatibility, I think this still runs the risk of breaking in older versions... we might need some script that can update old databases to fit this format. I think that's possible using sqlalchemy?

and change all other variables and function names to be consistent with
this change where appropriate
@apontzen apontzen merged commit 38e73a2 into pynbody:master Mar 20, 2021
@apontzen
Copy link
Member

Thank you Michael! Heroic multi-year effort!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants