Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Add an object path as a way to uniquely identify an object in the API #1108

Open
h-mayorquin opened this issue Apr 30, 2024 · 8 comments
Assignees
Labels
category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s) topic: PyNWB Issues related to the use HDMF in PyNWB
Milestone

Comments

@h-mayorquin
Copy link
Contributor

h-mayorquin commented Apr 30, 2024

It would be great to have something that can be used to specify an object within the nwbfile that is both unique and independent of the backend. An abstraction that can be used is that of paths so I am imaging an API that could look like this:

electrical_series = nwbfile.get_object_by_path("acquistion/ElectricalSeries")
electrical_series.get_api_path() == "acquistion/ElectricalSeries"

Use cases

In opposition to the object_id that uniquely specifies the object within the NWBFile the location can identify an object in an NWB that remains the same across different sessions. This can be used for:

  • Building configurations (e.g. chunking, compression, etc) that will apply to the same object in conversions even across different sessions.
  • Quickly accessing specific files for visualization or analysis for files with well known structure.

Previous or Similar Art

This function was implemented in neuroconv:

https://github.com/catalystneuro/neuroconv/blob/47a066ca8c58b88064bfecee90cfcfc70409d135/src/neuroconv/tools/nwb_helpers/_configuration_models/_base_dataset_io.py#L28-L44

And it produces output like this:

acquisition/TestDynamicTable/TestColumn/data
acquisition/NewTimeSeries/data
acquisition/TestElectricalSeries/data

Then the function was ported to pynwb:

https://github.com/NeurodataWithoutBorders/pynwb/blob/2259bede338f2f202229bda0af15d7e3cea47369/src/pynwb/base.py#L290-L324

Complexities

The fact that hdf5 and zarr might have a different paths than the pynwb API can be confusing. An example that @rly pointed out is the electrical series.

Other considerations

  • There might be a better abstraction than a path to build unique identifiers?
  • I think it should be method and not an attribute because it might be costly to compute. I think functions indicate that better.
  • Streaming considerations, can we reduce the portion of the file visited when we are accessing the object by path or calculating paths?
  • How does it play with the idea of tagging instead of having a structure? it seems that a flat files with tags can make this redundant.
  • Is hdmf the place for this to live or is it better to have it in pynwb?

I probably missed some subtleties from today's discussion, so I am tagging people here so they can correct my mistake @rly @bendichter @CodyCBakerPhD

@bendichter
Copy link
Contributor

@h-mayorquin I think this was a point of a bit of confusion during the meeting. I believe the way @rly was using the terms is:

nwbfile.acquisition["ElectricalSeries"].data is the "api path"

/acquisition/Electricalseries/data would be something else. Maybe a "hierarchy path"?

@h-mayorquin
Copy link
Contributor Author

I want to differentiate three things:

  1. The set of code that you use to access something in the API: nwbfile.electrodes
  2. The path the object will have in the backend (as zarr and hdf5 are file-like). (in hdf5 and zarr /general/extracellular_ephys/electrodes)
  3. A unique string that looks like a path that characterizes the object. The natural candidate is ``/general/extracellular_ephys/electrodes`.

I don't know good terminology to differentiate between them. I think we can use 2 for 3. Right now both zarr and hdf5 have the same "file-like" structure? If so, that would be the simplest thing to do I feel.

@oruebel
Copy link
Contributor

oruebel commented Apr 30, 2024

@h-mayorquin can you clarify, what is the difference between 3 and 2, and why is it needed.

@h-mayorquin
Copy link
Contributor Author

@oruebel I expect that we don't need a distinction between 2 and 3 but ... we might have a backend that does not have a file-structure like hdf5 and zarr? The path of the objects within zarr and hdf5 backends might differ from some objects? I want to emphasize that it should be a backend independent concept hence the distinction.

Does that make sense?

@bendichter
Copy link
Contributor

@h-mayorquin we may have a backend that does not internally use the "/" syntax for Group membership in their Python API, but any backend must enable the HDMF primitives, which means it must have the concept of a Group, so would be mappable to this syntax. Unless there is a good reason not to, I would like to propose we use the HDF5/Zarr path as the unique identifier.

@h-mayorquin
Copy link
Contributor Author

@bendichter

Unless there is a good reason not to, I would like to propose we use the HDF5/Zarr path as the unique identifier.

Totally agree with that.

@oruebel
Copy link
Contributor

oruebel commented Apr 30, 2024

I would like to propose we use the HDF5/Zarr path as the unique identifier.

The real reference here is the mapping to schema. I.e., the path the object will have in the Builder structure. For HDF5/Zarr the path in the file and in Builder hierarchy are identical. All I'm trying to say is, even for non-hierarchical backend stores, we can determine that path from the schema.

@h-mayorquin
Copy link
Contributor Author

Ah, got you, thanks for the explanation! That's great to hear.

@rly rly added category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s) topic: PyNWB Issues related to the use HDMF in PyNWB labels May 2, 2024
@rly rly added this to the Future milestone May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s) topic: PyNWB Issues related to the use HDMF in PyNWB
Projects
None yet
Development

No branches or pull requests

5 participants