Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFStore.walk() to iterate on groups #21339

Closed
wants to merge 9 commits into from
Closed

HDFStore.walk() to iterate on groups #21339

wants to merge 9 commits into from

Conversation

CharlesB2
Copy link

This PR adds a walk() method to HDFStore in order to iterate on groups.

This is a revival of the initial PR #10932, rebased on upstream/master and updated tests.

@CharlesB2 CharlesB2 changed the title Issue 10143 HDFStore.walk() to iterate on groups Jun 6, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments

group is used.

The where argument can be a path string
or a Group instance (see :ref:`GroupClassDescr`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this refer to? best to directly link to the pytables docs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it doesn't make sense for where to be anything else than str, removing this mention

@@ -1106,6 +1106,46 @@ def groups(self):
g._v_name != u('table'))))
]

def walk(self, where="/"):
""" Walk the pytables group hierarchy yielding the group name and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be a single sumary line, then a multi-line extended summary

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

def walk(self, where="/"):
""" Walk the pytables group hierarchy yielding the group name and
pandas object names for each group. Any non-pandas PyTables objects
that are not a group will be ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a Parameters section

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

}

with ensure_clean_store('walk_groups.hdf', mode='w') as store:
store.put('/first_group/df1', objs['df1'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a table or 2

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -4999,6 +4999,62 @@ def test_read_nokey_empty(self):
store.close()
pytest.raises(ValueError, read_hdf, path)

# GH10143
def test_walk(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the comment inside the test function

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -4999,6 +4999,62 @@ def test_read_nokey_empty(self):
store.close()
pytest.raises(ValueError, read_hdf, path)

# GH10143
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this near tests for get_node

Copy link
Author

@CharlesB2 CharlesB2 Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I couldn't spot any test for get_node in the repo? The nearest I could find is TestHDFStore.test_get()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved


expect1 = {
'': ({'first_group', 'second_group'}, set()),
'/first_group': (set(), {'df1', 'df2'}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parameterize this test rather than making a long test like this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -3554,6 +3554,22 @@ everything in the sub-store and **below**, so be *careful*.
store.remove('food')
store


You can walk through the group hierarchy using the ``walk`` method which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a versionadded

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1106,6 +1106,46 @@ def groups(self):
g._v_name != u('table'))))
]

def walk(self, where="/"):
""" Walk the pytables group hierarchy yielding the group name and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a versionadded

@jreback jreback added the IO HDF5 read_hdf, HDFStore label Jun 6, 2018
Stephen Pascoe and others added 4 commits June 12, 2018 23:57
…ile.

This implementation is inspired by os.walk and follows the interface as much as possible.
Including small fix to remove redundant '/' from group names.
walk() can be called with where argument to specify the root node.
Tests updated with the enhancement
@codecov
Copy link

codecov bot commented Jun 12, 2018

Codecov Report

Merging #21339 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21339      +/-   ##
==========================================
+ Coverage    91.9%    91.9%   +<.01%     
==========================================
  Files         154      154              
  Lines       49562    49577      +15     
==========================================
+ Hits        45549    45564      +15     
  Misses       4013     4013
Flag Coverage Δ
#multiple 90.27% <6.66%> (-0.03%) ⬇️
#single 41.87% <100%> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/io/pytables.py 92.48% <100%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 36422a8...9e7d7a0. Read the comment docs.

@CharlesB2
Copy link
Author

I think I addressed your review's comments

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment. can you elaborate on why you think the return values are the right thing here? e.g. is this from the PyTables api directly? e.g. what is the point of having the groups and subkeys? wouldn't you just care about the subkeys?

@@ -16,7 +16,8 @@ Other Enhancements
- :func:`Series.mode` and :func:`DataFrame.mode` now support the ``dropna`` parameter which can be used to specify whether NaN/NaT values should be considered (:issue:`17534`)
- :func:`to_csv` now supports ``compression`` keyword when a file handle is passed. (:issue:`21227`)
- :meth:`Index.droplevel` is now implemented also for flat indexes, for compatibility with MultiIndex (:issue:`21115`)

- New method :meth:`HDFStore.walk` will recursively walk the group hierarchy of a HDF5 file (:issue:`10932`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add this to api.rst as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an HDF5 file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jreback jreback added this to the 0.24.0 milestone Jun 19, 2018
@CharlesB2
Copy link
Author

minor comment. can you elaborate on why you think the return values are the right thing here? e.g. is this from the PyTables api directly? e.g. what is the point of having the groups and subkeys? wouldn't you just care about the subkeys?

It was made this way in the original PR, I didn't see any reason to change it. It's a refinement from the PyTables api, I guess the original author @stephenpascoe found it convenient.

I made the docs change in the last forced push.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment. looks good. ping on green.

@@ -17,8 +17,9 @@ Other Enhancements
- :func:`to_datetime` now supports the ``%Z`` and ``%z`` directive when passed into ``format`` (:issue:`13486`)
- :func:`Series.mode` and :func:`DataFrame.mode` now support the ``dropna`` parameter which can be used to specify whether NaN/NaT values should be considered (:issue:`17534`)
- :func:`to_csv` now supports ``compression`` keyword when a file handle is passed. (:issue:`21227`)
- :meth:`Index.droplevel` is now implemented also for flat indexes, for compatibility with :class:`MultiIndex` (:issue:`21115`)

- :meth:`Index.droplevel` is now implemented also for flat indexes, for compatibility with MultiIndex (:issue:`21115`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was updated and the diff is taking the original, can you restore (the droplevel line)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Charles Brossollet and others added 2 commits June 26, 2018 12:40
@jreback
Copy link
Contributor

jreback commented Jun 26, 2018

merged via 45e55af

for some reason couldn't push to your remote, so manually merged.

thanks!

@jreback jreback closed this Jun 26, 2018
@CharlesB2 CharlesB2 deleted the issue-10143 branch June 27, 2018 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Iterate over HDF store hierarchically
2 participants