Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Iterate over HDF store hierarchically #10143

Closed
nileracecrew opened this issue May 15, 2015 · 8 comments
Closed

ENH: Iterate over HDF store hierarchically #10143

nileracecrew opened this issue May 15, 2015 · 8 comments
Milestone

Comments

@nileracecrew
Copy link
Contributor

I am relatively new to pandas, but have been using raw HDF5 files for a long itme, so apologies if this feature or something equivalent already exists.

As stated in the documentation, it is possible to create hierarchies of data stores:

store.put('/first_group/df1', df1)
store.put('/first_group/df2', df2)
store.put('/second_group/df3', df3)

However, it seems that store.keys() can only provide a flattened list of all available groups. It would be nice to be able to walk the HDF5 file hierarchically. A couple examples (this is borrowing a bit from the h5py API):

# do stuff on each subgroup/substore in /first_group
for s in store['first_group'].values() :
    # do stuff on df1 and df2

# visit each group at the root level
for s in store.values() :
    if 'df1' in s :
        # do stuff on any store named 'df1'

The main use case I would have for something like this is that I have many dataframes with different schema but would like to group some of them together and then iterate through those groups.

Best way I can think to emulate this is to parse the strings in store.keys(), but that is pretty ugly.

@nileracecrew nileracecrew changed the title Iterate over HDF store hierarchically ENH: Iterate over HDF store hierarchically May 15, 2015
@jreback
Copy link
Contributor

jreback commented May 15, 2015

wouldn't be opposed to adding a .walk() method, similar to how PyTables.Table.walk and os.walk works. You just iterate and yield each node, up to the user to do something with them.

@jreback jreback added this to the Next Major Release milestone May 15, 2015
@nileracecrew
Copy link
Contributor Author

That would provide at least a basic hook for walking through the HDFStore.

Thinking more about my particular workflow, it might make sense instead to maintain a separate DataFrame with the paths to each DataFrame store, and some organizing metadata to query from.

@stephenpascoe
Copy link

I'd like to work on this during the EuroSciPy 2015 sprints #10877

@nileracecrew
Copy link
Contributor Author

In case it's useful, I ended up using this construction to get all the
nodes at a particular location instead of walking the whole tree.

# simplified example from working code but not specifically tested so there may be typos

# added this method to pd.HDFStore
def iter_nodes(self, loc):
   return (n._v_name for n in self._handle.iter_nodes(loc))

# ...

with pd.HDFStore('mystore.h5') as store:
    for g in store.iter_nodes('/'):
        # do something to each node (group, dataset, etc) at the root level

@nileracecrew
Copy link
Contributor Author

Related #6833

stephenpascoe pushed a commit to stephenpascoe/pandas that referenced this issue Aug 30, 2015
…les HDF5 file.

This implementation is inspired by os.walk and follows the interface as much as possible.
@stephenpascoe
Copy link

Created PR #10932

@jreback jreback modified the milestones: 0.17.0, Next Major Release Aug 30, 2015
stephenpascoe pushed a commit to stephenpascoe/pandas that referenced this issue Sep 2, 2015
…les HDF5 file.

This implementation is inspired by os.walk and follows the interface as much as possible.
@NumesSanguis
Copy link

This issue has been open since 2015 and has the label "Effort Low", but it seems it has not been decided yet how to solve it? What is currently the recommended way to traverse the HDF5 store? I would like to get a list of only the top-level keys/groups.

@jreback jreback modified the milestones: Next Major Release, 0.24.0 Jun 19, 2018
@jreback
Copy link
Contributor

jreback commented Jun 26, 2018

merged via 45e55af

@jreback jreback closed this as completed Jun 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants