-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ExtraDataFunctor for integration with pasha #299
base: master
Are you sure you want to change the base?
Conversation
Thanks. I think As we were just discussing, |
Do you want to add a test for the pasha integration, BTW? |
Yes. How would the tests work dependency-wise? Do we add I do not much like the hack for iterating |
# different process than the functor was initially created in, | ||
# close all file handles inherited from the parent collection to | ||
# force re-opening them again in each child process. | ||
if getpid() != self._parent_pid: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this at all with the dependencies we require?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not really.
It's a bit tricky to be definitive because the change is in HDF5, and h5py can be built with a wide range of HDF5 versions. EXtra-data supports h5py 2.10, and the pre-built packages of 2.10 on PyPI have HDF5 1.10.4. So I might bump the required version to >= 3.0, just to be a bit cautious. I hope very few people are still stuck on h5py 2.x.
I would add pasha to the |
As it came up again in European-XFEL/pasha#14, here's a proof-of-concept on how to move the
ExtraDataFunctor
implementation to EXtra-data itself. This would allow it to take advantage of private APIs as well as test properly (TBD). So far it's almost exactly the same implementation only with unnecessary import checks removed.As the code is only loaded conditionally, it does not add an actual dependency. The import of
gen_split_slices
could even be removed onceSourceData
has its own.split_trains()
method.Any ideas for the module name?