[Feature]: Add `read_nwb` to simplify reading nwbfiles #1974

h-mayorquin · 2024-10-23T15:48:31Z

What would you like to see added to PyNWB?

This is a comment that I have gotten from various users: reading and nwbfile is not as easy as it could be.

At the moment, when I google "how to read and nwbfile", the results show a suggestion to use NWBHDF5IO. This is also what we have on the tutorial.

While the IO approach is flexible, powerful, and follows good practices (i.e., it can be used as a context manager) it also suffers from what I think are several problems that detract from usability:

It has too many options: path, file, region, mode, manager, etc.
It is hard to remember the name to import.
It does not read zarr.
It requires two method calls, first instantiating the IO and then using read on the IO.

What solution would you like?

I think that we could provide users with a simple entry point that covers most cases and leave the IO approach as the power tool behind the scenes:

from pynwb import read_nwb

nwbfile = read_nwb(file_path)

I think the name is easier to remember and intuitive about what it does.

For simplicity, this function should:

Work only paths
Always on read mode
Open both zarr and hdf5.
load_namespaces=True
If we are ambitious we can have a default streaming choice if a link to dandi s3 asset is passed.

Downsides?

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

I agree to follow this project's Code of Conduct
Have you checked the Contributing document?
Have you ensured this change was not already requested?

stephprince · 2024-10-24T17:27:58Z

I like this idea! For me the main downside would be any potential confusion for users about having multiple methods to read a file and when to use read_nwb vs. NWBHDF5IO, but I think this could be clarified in the docs.

@oruebel @rly @bendichter thoughts?

oruebel · 2024-10-24T17:49:10Z

potential confusion for users about having multiple methods to read a file and when to use read_nwb vs. NWBHDF5IO

I would make this part of io/utils.py to also make it clear on import that this is a convenience function. In addition, I think we want to be clear about the scoping of the function . Ie., folks will ask for the function to do everything, i.e, be able to open any NWB file and be able to configurabl all possible options. I think we want to limit scope the function to be: 1) read only and 2) have only the filename as a parameter and not support additional configurations. I.e., have it as a convenience function for opening a file for just read. As such NWBHDF5IO would still be used in most of our tutorials, since many tutorials show how to write data and read_io would be used mainly in read tutorials. Also, just scoping for read will be challenging enough since the function will need to work with:

both HDF5 and Zarr and return ~~either NWBHDF5IO or NWBZarrIO~~ an NWBFile object, which uses either NWBHDF5IO or NWBZarrIO for read (i.e., NWBFile.get_read_io() may be either NWBHDF5IO or NWBZarrIO)
S3 paths to open a files on DANDI
know when to load namespaces (or default to always load namespaces?)
...

stephprince · 2024-10-24T18:58:27Z

Here an example implementation of a read_nwb method that Cody made for NWBInspector that might be useful as a reference. And if we do incorporate streaming, here is a related proposal on a default function for remote read: #1739

Also, just scoping for read will be challenging enough since the function will need to work with:
both HDF5 and Zarr and return either NWBHDF5IO or NWBZarrIO

If I’m understanding the initial proposal, with this convenience method we would not return an IO object but just the NWBFile object?

oruebel · 2024-10-24T19:20:55Z

If I’m understanding the initial proposal, with this convenience method we would not return an IO object but just the NWBFile object?

Sorry, you are correct, read_nwb should always return the NWBFile (not the IO object). I amended my comment above accordingly. However, the I/O object is still accessible via the NWBFile object via NWBFile.get_read_io(), i.e.,:

nwbfile = read_nwb('myfile.nwb')   # read HDF5
nwbfile.get_read_io()   # this returns NWBHDF5IO

nwbfile = read_nwb('myfile.zarr.nwb')  # read Zarr.
nwbfile.get_read_io()   # this returns NWBZarrIO

magland · 2024-11-01T11:33:46Z

I like this idea, but I would hope that it would also support passing in an h5py object instead of just a file path or an s3 path. That would allow streaming. For what I do, I am almost always streaming from a remote location. Passing in the h5py object provides a lot of flexibility on how this would be done (streaming method, authentication, etc).

h-mayorquin · 2024-11-01T14:07:53Z

I like this idea, but I would hope that it would also support passing in an h5py object instead of just a file path or an s3 path. That would allow streaming. For what I do, I am almost always streaming from a remote location. Passing in the h5py object provides a lot of flexibility on how this would be done (streaming method, authentication, etc).

My view is that the intended audience of this function is users without a lot of experience. I would prefer to keep the signature and the input types very simple and defer more complex use cases like yours to their respective backend IOs.

I am coming at this from the angle that if the format is successful reading will be way more common than writing, so I just want to have a top level method that is easier to remember, use and covers 95 % of the cases for reading files.

h-mayorquin · 2024-11-01T14:19:15Z

To make this more concrete, this is what I am envisioning as the signature, input types and docstring:

from pathlib import Path
from pynwb import NWBFile

def read_nwb(path: str | Path) -> NWBFile:
    """Read an NWB file from a local path or remote URL.

    Provides a simple, high-level interface for reading NWB files in the most 
    common use cases. Automatically handles both HDF5 and Zarr formats.
    For advanced use cases (parallel I/O, custom namespaces), use NWBHDF5IO or NWBZarrIO.

    Parameters
    ----------
    path : str or pathlib.Path
        Path to the NWB file. Can be either a local filesystem path to an HDF5 (.nwb) 
        or Zarr (.zarr) file, or a remote URL (e.g., DANDI S3 asset URL).

    Returns
    -------
    pynwb.NWBFile
        The loaded NWB file object containing all datasets and metadata.

    See Also
    --------
    pynwb.NWBHDF5IO : Core I/O class for HDF5 files with advanced options.
    hdmf_zarr.nwb.NWBZarrIO : Core I/O class for Zarr files with advanced options.

    Notes
    -----
    This function uses the following defaults:
    * Always opens in read-only mode
    * Automatically loads namespaces
    * Detects file format based on extension
    * Automatically handles local and remote paths

    Advanced features requiring direct use of IO classes include:
    * Custom namespace extensions
    * Parallel I/O with MPI
    * Custom build managers
    * Write or append modes
    * Pre-opened HDF5 file objects or Zarr stores
    * Remote file access configuration

    Examples
    --------
    Read a local NWB file:

    >>> from pynwb import read_nwb
    >>> nwbfile = read_nwb("path/to/file.nwb")

    Read from a remote DANDI asset:

    >>> nwbfile = read_nwb("s3://dandiarchive/.../file.nwb")
    """
    pass  # Implementation details would go here

magland · 2024-11-01T14:27:03Z

I see your point, but I feel like allowing the argument to be an h5py object as well wouldn't complicate things very much. This would really simplify all my neurosift scripts, so I hope this could be considered.

Regarding s3 url, I think it's important to also support https://api.dandiarchive.org/... URLs Would be nice to allow any https://... so it's not limited to dandi.

oruebel · 2024-11-01T20:26:33Z

This would really simplify all my neurosift scripts, so I hope this could be considered.

@magland could you add a brief example to illustrate how read_nwb would simplify your scripts vs. using NWBHDF5IO? If you are passing in an open h5py object, then I think either should look very similar, but I'm probably missing something.

magland · 2024-11-01T21:52:33Z

@oruebel

f = lindi.LindiH5pyFile.from_lindi_file(url)
io = pynwb.NWBHDF5IO(file=f, mode='r')
nwbfile = io.read()

would become

f = lindi.LindiH5pyFile.from_lindi_file(url)
nwbfile = read_nwb(f)

So it saves one line... but it becomes a lot simpler to read.

oruebel · 2024-11-02T00:09:21Z

So it saves one line... but it becomes a lot simpler to read.

Thanks @magland for the clarification.

I think allowing a file as input instead of path would be OK, since it doesn't add any parameters to the interface. We may want to restrict it to h5py.File objects that are in read-only mode (or at least warn if they are not), since otherwise you might get a write-able file instead of a read-only file.

Just a thought, now that the io object is being stored on the NWBFile object. We could add static read_nwb methods on each backend, i.e., nwbfile = NWBHDF5IO.read_nwb(). This would help simplify the logic of the utility method read_nwb, since it would just call NWBHDF5IO.read_nwb() or NWBZarrIO.read_nwb() and then each backend could add additional logic. E.g., NWBHDF5IO.read_nwb() could accept and h5py.File instead of a path while keeping the interface of read_nwb simple to allow only path. I.e., in this case the lindi example would change to:

f = lindi.LindiH5pyFile.from_lindi_file(url)
nwbfile = NWBHDF5IO.read_nwb(f)

@h-mayorquin @magland what do you think? Would adding NWBHDF5IO.read_nwb() and NWBZarrIO.read_nwb() in addition to read_nwb make things simpler or worse?

magland · 2024-11-02T09:59:35Z

@oruebel That makes sense to me. I think if we had NWBHDF5IO.read_nwb(f) that supported h5py objects, then I wouldn't need the read_nwb utility, which could be reserved for the simplest case.

h-mayorquin · 2024-11-04T12:33:57Z

This would help simplify the logic of the utility method read_nwb, since it would just call NWBHDF5IO.read_nwb() or NWBZarrIO.read_nwb() and then each backend could add additional logic. E.g., NWBHDF5IO.read_nwb() could accept and h5py.File instead of a path while keeping the interface of read_nwb simple to allow only path.

This is great. Another advantage is that the reading code is in its "natural" place: the backend IO object. read_nwb is then just a router over those new methods.

oruebel · 2024-11-04T18:38:57Z

Sounds like we have a plan:

PyNWB

Add NWBHDF5IO.read_nwb() which should accept: 1) local path, 2) S3 path, or 3) an open h5py file.
Add pynwb.read_nwb() which accepts a file path and calls the appropriate backend-specific read_nwb method
Update the tutorial on reading NWB files https://pynwb.readthedocs.io/en/latest/tutorials/general/plot_read_basics.html#sphx-glr-tutorials-general-plot-read-basics-py

HDMF_ZARR

Add NWBZarrIO.read_nwb() which should accept: 1) local path, 2) S3 path, or 3) an open Zarr store.

HDMF / HDMF_ZARR / PyNWB

We may also need to refine the logic in HDF5IO.can_read() and ZarrIO.can_read() (or overwrite the methods in NWBHDF5IO and NWBZarrIO) to handle S3 paths correctly.

oruebel · 2024-11-04T18:39:52Z

I created hdmf-dev/hdmf-zarr#225 for this and can take a stab at the changes needed in HDFM_ZARR.

@h-mayorquin would you be interested in taking a stab at the necessary changes in PyNWB?

h-mayorquin · 2024-11-04T19:17:54Z

Yes, I can take a look at it!

oruebel · 2024-11-07T02:07:43Z

@h-mayorquin the PR for adding NWBZarrIO.read_nwb for the Zarr backend is here hdmf-dev/hdmf-zarr#226

h-mayorquin · 2024-11-07T22:07:17Z

Here is what I think is the analogous PR on pynwb (first part):
#1979

oruebel · 2024-11-07T23:06:20Z

@h-mayorquin one item to add to the ToDo list is that we should also update the tutorial on reading NWB files https://pynwb.readthedocs.io/en/latest/tutorials/general/plot_read_basics.html#sphx-glr-tutorials-general-plot-read-basics-py to describe read_nwb for opening a file for reading.

h-mayorquin · 2024-11-07T23:11:07Z

Sounds good, I can take care of that in the PR for pynwb.read_nwb().

oruebel · 2024-11-12T02:25:22Z

@h-mayorquin hdmf-dev/hdmf-zarr#226 for adding NWBZarrIO.read_nwb has been merged

h-mayorquin · 2024-11-12T03:25:03Z

OK, now that the PyNWB part has been merged #1979 I think I can work on the second part:

Add pynwb.read_nwb() which accepts a file path and calls the appropriate backend-specific read_nwb method

And updating the tutorial.

h-mayorquin · 2024-11-16T04:19:55Z

I added pynwb.read_nwb for local files:

#1994

I would wait for one of our meetings to discuss support for remote_paths in pynwb._read_nwb.

h-mayorquin changed the title ~~[Feature]: Add read_nwb as a simple and memorable method to open an nwbfile~~ [Feature]: Add read_nwb to simplify reading nwbfiles Oct 23, 2024

stephprince added category: proposal proposed enhancements or new features priority: low alternative solution already working and/or relevant to only specific user(s) labels Oct 24, 2024

rly assigned rly and stephprince and unassigned rly Oct 31, 2024

rly added this to the Future milestone Oct 31, 2024

oruebel mentioned this issue Nov 4, 2024

[Feature]: Add NWBZarrIO.read_nwb method hdmf-dev/hdmf-zarr#225

Closed

3 tasks

stephprince mentioned this issue Nov 7, 2024

[Feature]: support streaming NWB Zarr files from DANDI NeurodataWithoutBorders/nwbinspector#518

Open

4 tasks

h-mayorquin mentioned this issue Nov 7, 2024

add NWBHDF5IO.read_nwb() method #1979

Merged

6 tasks

h-mayorquin mentioned this issue Nov 16, 2024

Add read_nwb_method for local paths in both hdf5 and zarr #1994

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Add `read_nwb` to simplify reading nwbfiles #1974

[Feature]: Add `read_nwb` to simplify reading nwbfiles #1974

h-mayorquin commented Oct 23, 2024 •

edited

Loading

stephprince commented Oct 24, 2024

oruebel commented Oct 24, 2024 •

edited

Loading

stephprince commented Oct 24, 2024 •

edited

Loading

oruebel commented Oct 24, 2024

magland commented Nov 1, 2024

h-mayorquin commented Nov 1, 2024

h-mayorquin commented Nov 1, 2024 •

edited

Loading

magland commented Nov 1, 2024

oruebel commented Nov 1, 2024

magland commented Nov 1, 2024

oruebel commented Nov 2, 2024

magland commented Nov 2, 2024

h-mayorquin commented Nov 4, 2024

oruebel commented Nov 4, 2024 •

edited

Loading

oruebel commented Nov 4, 2024

h-mayorquin commented Nov 4, 2024

oruebel commented Nov 7, 2024

h-mayorquin commented Nov 7, 2024

oruebel commented Nov 7, 2024

h-mayorquin commented Nov 7, 2024

oruebel commented Nov 12, 2024

h-mayorquin commented Nov 12, 2024

h-mayorquin commented Nov 16, 2024

[Feature]: Add read_nwb to simplify reading nwbfiles #1974

[Feature]: Add read_nwb to simplify reading nwbfiles #1974

Comments

h-mayorquin commented Oct 23, 2024 • edited Loading

What would you like to see added to PyNWB?

What solution would you like?

Do you have any interest in helping implement the feature?

Code of Conduct

stephprince commented Oct 24, 2024

oruebel commented Oct 24, 2024 • edited Loading

stephprince commented Oct 24, 2024 • edited Loading

oruebel commented Oct 24, 2024

magland commented Nov 1, 2024

h-mayorquin commented Nov 1, 2024

h-mayorquin commented Nov 1, 2024 • edited Loading

magland commented Nov 1, 2024

oruebel commented Nov 1, 2024

magland commented Nov 1, 2024

oruebel commented Nov 2, 2024

magland commented Nov 2, 2024

h-mayorquin commented Nov 4, 2024

oruebel commented Nov 4, 2024 • edited Loading

oruebel commented Nov 4, 2024

h-mayorquin commented Nov 4, 2024

oruebel commented Nov 7, 2024

h-mayorquin commented Nov 7, 2024

oruebel commented Nov 7, 2024

h-mayorquin commented Nov 7, 2024

oruebel commented Nov 12, 2024

h-mayorquin commented Nov 12, 2024

h-mayorquin commented Nov 16, 2024

[Feature]: Add `read_nwb` to simplify reading nwbfiles #1974

[Feature]: Add `read_nwb` to simplify reading nwbfiles #1974

h-mayorquin commented Oct 23, 2024 •

edited

Loading

oruebel commented Oct 24, 2024 •

edited

Loading

stephprince commented Oct 24, 2024 •

edited

Loading

h-mayorquin commented Nov 1, 2024 •

edited

Loading

oruebel commented Nov 4, 2024 •

edited

Loading