-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFStore.walk() to iterate on groups #21339
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments
pandas/io/pytables.py
Outdated
group is used. | ||
|
||
The where argument can be a path string | ||
or a Group instance (see :ref:`GroupClassDescr`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this refer to? best to directly link to the pytables docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it doesn't make sense for where
to be anything else than str
, removing this mention
pandas/io/pytables.py
Outdated
@@ -1106,6 +1106,46 @@ def groups(self): | |||
g._v_name != u('table')))) | |||
] | |||
|
|||
def walk(self, where="/"): | |||
""" Walk the pytables group hierarchy yielding the group name and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be a single sumary line, then a multi-line extended summary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/io/pytables.py
Outdated
def walk(self, where="/"): | ||
""" Walk the pytables group hierarchy yielding the group name and | ||
pandas object names for each group. Any non-pandas PyTables objects | ||
that are not a group will be ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a Parameters section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
pandas/tests/io/test_pytables.py
Outdated
} | ||
|
||
with ensure_clean_store('walk_groups.hdf', mode='w') as store: | ||
store.put('/first_group/df1', objs['df1']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a table or 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
pandas/tests/io/test_pytables.py
Outdated
@@ -4999,6 +4999,62 @@ def test_read_nokey_empty(self): | |||
store.close() | |||
pytest.raises(ValueError, read_hdf, path) | |||
|
|||
# GH10143 | |||
def test_walk(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the comment inside the test function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/tests/io/test_pytables.py
Outdated
@@ -4999,6 +4999,62 @@ def test_read_nokey_empty(self): | |||
store.close() | |||
pytest.raises(ValueError, read_hdf, path) | |||
|
|||
# GH10143 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put this near tests for get_node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I couldn't spot any test for get_node in the repo? The nearest I could find is TestHDFStore.test_get()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved
pandas/tests/io/test_pytables.py
Outdated
|
||
expect1 = { | ||
'': ({'first_group', 'second_group'}, set()), | ||
'/first_group': (set(), {'df1', 'df2'}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize this test rather than making a long test like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -3554,6 +3554,22 @@ everything in the sub-store and **below**, so be *careful*. | |||
store.remove('food') | |||
store | |||
|
|||
|
|||
You can walk through the group hierarchy using the ``walk`` method which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a versionadded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/io/pytables.py
Outdated
@@ -1106,6 +1106,46 @@ def groups(self): | |||
g._v_name != u('table')))) | |||
] | |||
|
|||
def walk(self, where="/"): | |||
""" Walk the pytables group hierarchy yielding the group name and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a versionadded
…ile. This implementation is inspired by os.walk and follows the interface as much as possible.
Including small fix to remove redundant '/' from group names.
walk() can be called with where argument to specify the root node. Tests updated with the enhancement
Codecov Report
@@ Coverage Diff @@
## master #21339 +/- ##
==========================================
+ Coverage 91.9% 91.9% +<.01%
==========================================
Files 154 154
Lines 49562 49577 +15
==========================================
+ Hits 45549 45564 +15
Misses 4013 4013
Continue to review full report at Codecov.
|
I think I addressed your review's comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment. can you elaborate on why you think the return values are the right thing here? e.g. is this from the PyTables api directly? e.g. what is the point of having the groups and subkeys? wouldn't you just care about the subkeys?
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -16,7 +16,8 @@ Other Enhancements | |||
- :func:`Series.mode` and :func:`DataFrame.mode` now support the ``dropna`` parameter which can be used to specify whether NaN/NaT values should be considered (:issue:`17534`) | |||
- :func:`to_csv` now supports ``compression`` keyword when a file handle is passed. (:issue:`21227`) | |||
- :meth:`Index.droplevel` is now implemented also for flat indexes, for compatibility with MultiIndex (:issue:`21115`) | |||
|
|||
- New method :meth:`HDFStore.walk` will recursively walk the group hierarchy of a HDF5 file (:issue:`10932`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add this to api.rst as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an HDF5 file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
It was made this way in the original PR, I didn't see any reason to change it. It's a refinement from the PyTables api, I guess the original author @stephenpascoe found it convenient. I made the docs change in the last forced push. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment. looks good. ping on green.
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -17,8 +17,9 @@ Other Enhancements | |||
- :func:`to_datetime` now supports the ``%Z`` and ``%z`` directive when passed into ``format`` (:issue:`13486`) | |||
- :func:`Series.mode` and :func:`DataFrame.mode` now support the ``dropna`` parameter which can be used to specify whether NaN/NaT values should be considered (:issue:`17534`) | |||
- :func:`to_csv` now supports ``compression`` keyword when a file handle is passed. (:issue:`21227`) | |||
- :meth:`Index.droplevel` is now implemented also for flat indexes, for compatibility with :class:`MultiIndex` (:issue:`21115`) | |||
|
|||
- :meth:`Index.droplevel` is now implemented also for flat indexes, for compatibility with MultiIndex (:issue:`21115`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was updated and the diff is taking the original, can you restore (the droplevel line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Update droplevel change line
merged via 45e55af for some reason couldn't push to your remote, so manually merged. thanks! |
This PR adds a walk() method to HDFStore in order to iterate on groups.
This is a revival of the initial PR #10932, rebased on upstream/master and updated tests.
git diff upstream/master -u -- "*.py" | flake8 --diff