-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Add read_nwb
to simplify reading nwbfiles
#1974
Comments
read_nwb
as a simple and memorable method to open an nwbfileread_nwb
to simplify reading nwbfiles
I like this idea! For me the main downside would be any potential confusion for users about having multiple methods to read a file and when to use @oruebel @rly @bendichter thoughts? |
I would make this part of io/utils.py to also make it clear on import that this is a convenience function. In addition, I think we want to be clear about the scoping of the function . Ie., folks will ask for the function to do everything, i.e, be able to open any NWB file and be able to configurabl all possible options. I think we want to limit scope the function to be: 1) read only and 2) have only the filename as a parameter and not support additional configurations. I.e., have it as a convenience function for opening a file for just read. As such
|
Here an example implementation of a
If I’m understanding the initial proposal, with this convenience method we would not return an IO object but just the NWBFile object? |
Sorry, you are correct,
|
I like this idea, but I would hope that it would also support passing in an h5py object instead of just a file path or an s3 path. That would allow streaming. For what I do, I am almost always streaming from a remote location. Passing in the h5py object provides a lot of flexibility on how this would be done (streaming method, authentication, etc). |
My view is that the intended audience of this function is users without a lot of experience. I would prefer to keep the signature and the input types very simple and defer more complex use cases like yours to their respective backend IOs. I am coming at this from the angle that if the format is successful reading will be way more common than writing, so I just want to have a top level method that is easier to remember, use and covers 95 % of the cases for reading files. |
To make this more concrete, this is what I am envisioning as the signature, input types and docstring: from pathlib import Path
from pynwb import NWBFile
def read_nwb(path: str | Path) -> NWBFile:
"""Read an NWB file from a local path or remote URL.
Provides a simple, high-level interface for reading NWB files in the most
common use cases. Automatically handles both HDF5 and Zarr formats.
For advanced use cases (parallel I/O, custom namespaces), use NWBHDF5IO or NWBZarrIO.
Parameters
----------
path : str or pathlib.Path
Path to the NWB file. Can be either a local filesystem path to an HDF5 (.nwb)
or Zarr (.zarr) file, or a remote URL (e.g., DANDI S3 asset URL).
Returns
-------
pynwb.NWBFile
The loaded NWB file object containing all datasets and metadata.
See Also
--------
pynwb.NWBHDF5IO : Core I/O class for HDF5 files with advanced options.
hdmf_zarr.nwb.NWBZarrIO : Core I/O class for Zarr files with advanced options.
Notes
-----
This function uses the following defaults:
* Always opens in read-only mode
* Automatically loads namespaces
* Detects file format based on extension
* Automatically handles local and remote paths
Advanced features requiring direct use of IO classes include:
* Custom namespace extensions
* Parallel I/O with MPI
* Custom build managers
* Write or append modes
* Pre-opened HDF5 file objects or Zarr stores
* Remote file access configuration
Examples
--------
Read a local NWB file:
>>> from pynwb import read_nwb
>>> nwbfile = read_nwb("path/to/file.nwb")
Read from a remote DANDI asset:
>>> nwbfile = read_nwb("s3://dandiarchive/.../file.nwb")
"""
pass # Implementation details would go here |
I see your point, but I feel like allowing the argument to be an h5py object as well wouldn't complicate things very much. This would really simplify all my neurosift scripts, so I hope this could be considered. Regarding s3 url, I think it's important to also support https://api.dandiarchive.org/... URLs Would be nice to allow any https://... so it's not limited to dandi. |
@magland could you add a brief example to illustrate how |
would become
So it saves one line... but it becomes a lot simpler to read. |
Thanks @magland for the clarification. I think allowing a file as input instead of path would be OK, since it doesn't add any parameters to the interface. We may want to restrict it to h5py.File objects that are in read-only mode (or at least warn if they are not), since otherwise you might get a write-able file instead of a read-only file. Just a thought, now that the
@h-mayorquin @magland what do you think? Would adding |
@oruebel That makes sense to me. I think if we had NWBHDF5IO.read_nwb(f) that supported h5py objects, then I wouldn't need the read_nwb utility, which could be reserved for the simplest case. |
This is great. Another advantage is that the reading code is in its "natural" place: the backend IO object. |
Sounds like we have a plan: PyNWB
HDMF_ZARR
HDMF / HDMF_ZARR / PyNWB
|
I created hdmf-dev/hdmf-zarr#225 for this and can take a stab at the changes needed in HDFM_ZARR. @h-mayorquin would you be interested in taking a stab at the necessary changes in PyNWB? |
Yes, I can take a look at it! |
@h-mayorquin the PR for adding |
Here is what I think is the analogous PR on pynwb (first part): |
@h-mayorquin one item to add to the ToDo list is that we should also update the tutorial on reading NWB files https://pynwb.readthedocs.io/en/latest/tutorials/general/plot_read_basics.html#sphx-glr-tutorials-general-plot-read-basics-py to describe |
Sounds good, I can take care of that in the PR for |
@h-mayorquin hdmf-dev/hdmf-zarr#226 for adding |
OK, now that the PyNWB part has been merged #1979 I think I can work on the second part:
And updating the tutorial. |
I added I would wait for one of our meetings to discuss support for remote_paths in |
What would you like to see added to PyNWB?
This is a comment that I have gotten from various users: reading and nwbfile is not as easy as it could be.
At the moment, when I google "how to read and nwbfile", the results show a suggestion to use NWBHDF5IO. This is also what we have on the tutorial.
While the IO approach is flexible, powerful, and follows good practices (i.e., it can be used as a context manager) it also suffers from what I think are several problems that detract from usability:
What solution would you like?
I think that we could provide users with a simple entry point that covers most cases and leave the IO approach as the power tool behind the scenes:
I think the name is easier to remember and intuitive about what it does.
For simplicity, this function should:
Downsides?
Do you have any interest in helping implement the feature?
Yes.
Code of Conduct
The text was updated successfully, but these errors were encountered: