Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

from_hdf5 function (TEP014) #711

Closed
wants to merge 8 commits into from
Closed

from_hdf5 function (TEP014) #711

wants to merge 8 commits into from

Conversation

vg3095
Copy link
Contributor

@vg3095 vg3095 commented Feb 24, 2017

@wkerzendorf
This PR is related to first and bonus objective for TEP014 (Reading Simulation from file).
I have implemented a from_hdf5 function in Simulation Base class, which returns Radial1DModel object.
Presently it is not extended for Plasma and MonteCarlo Runner objects.
homogeneous_density and luminosity_requested are set to None for now.

Example Usage :

from tardis.simulation import Simulation
model = Simulation.from_hdf5('/path/to/model_output.hdf')
print model.time_explosion
print model.v_boundary_inner
print model.v_boundary_outer

@vg3095 vg3095 changed the title Vs1 from_hdf5 function (TEP014) Feb 24, 2017
@wkerzendorf
Copy link
Member

@vg3095 show me with syntax ;-)

@vg3095
Copy link
Contributor Author

vg3095 commented Feb 25, 2017

@wkerzendorf Now , I have used 'with' statement while handling files using pd.HDFStore and h5py.File function calls.
No logger warnings are raised now.

Copy link
Member

@ftsamis ftsamis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general that's quite good work. I left some inline comments which you may find useful in making the design more modular.

@@ -4,10 +4,11 @@
import pandas as pd
from astropy import units as u
from collections import OrderedDict

import h5py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pandas should be used instead of h5py.

Copy link
Contributor Author

@vg3095 vg3095 Mar 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now, pandas.HDFStore() does not support iterating over hierarchically , while h5py does.
The issue is still opened there . Link.

So, I am using h5py only for transversing HDF file hierarchically, and for reading particular attribute (like time_explosion) , I am using pd.HDFStore()

@@ -410,3 +411,138 @@ def from_config(cls, config, **kwargs):
convergence_strategy=config.montecarlo.convergence_strategy,
nthreads=config.montecarlo.nthreads)

@classmethod
def from_hdf5(cls, file_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename this to from_hdf



@classmethod
def read_plasma_data(cls, h5_file, path, sim_dict, file_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should live in the BasePlasma class, named as from_hdf

Copy link
Contributor Author

@vg3095 vg3095 Mar 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think , It will be better if I move it to BasePlasma class, without renaming.(as for the same reasons , mentioned above for Radial1DModel class)

return model

@classmethod
def read_model_data(cls, h5_file, path, sim_dict, file_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should live in the Radial1DModel class and named from_hdf.

Copy link
Contributor Author

@vg3095 vg3095 Mar 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the renaming part , read_model_data , just stores attributes in a nested dictionary , and it does not return anything. Renaming it to from_hdf under Radial1DModel class will be misleading , as it can mean, it returns Radial1DModel object .
So , I think , it will be better if I move it to Radial1DModel , without renaming.

Also, I cannot return Radial1DModel ,just by reading model attributes in HDF file .
Because , some attributes, which are required for generating this object , also resides in plasma and runner path. (like plasma/scalars/time_explosion)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I am suggesting is to take the work for reading a Radial1DModel from an hdf file away from Simulation and into Radial1DModel. If you do that, you may find out that you probably don't need a separate read_model_data function, but just combining it with from_hdf could be fine.

In order to understand better what I'd want it to look like, take a look at to_hdf. You'll see that the simulation.to_hdf, basicall does nothing more than calling model.to_hdf, runner.to_hdf and plasma.to_hdf and redirecting the responsibility towards the smaller structures.

Regarding your second point, there are two solutions:

  • Either change the to_hdfs to store the missing attributes, or
  • Just assume access to the entire hdf file in each class (Model, Plasma, Runner, Simulation, ...) and read attributes from other sections.

For now, doing the second is just fine.

sim_dict[key] = {}
if 'model' in key:
cls.read_model_data(
h5_file, simulation + '/model/', sim_dict[key], file_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you create a classmethod under Radial1DModel named from_hdf you could call this here like Radial1DModel.from_hdf(hdf_file, ...) and you'd get a fully populated Model. This way the from_hdf concept would be more modular and easier to extend.

h5_file, simulation + '/model/', sim_dict[key], file_path)
if 'plasma' in key:
cls.read_plasma_data(
h5_file, simulation + '/plasma/', sim_dict[key], file_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here, for BasePlasma.

v_boundary_outer)
# TODO : Extend it to plasma and montecarlo objects and return Simulation object

return model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's just a demo, but make sure the Simulation.from_hdf returns a Simulation object, not a model. Essentially, each class with a to_hdf should probably have a from_hdf classmethod, which returns a populated instance of the class it refers to.

@vg3095
Copy link
Contributor Author

vg3095 commented Mar 3, 2017

@ftsamis Thanks , for the detailed review . I have replied to some of the comments . Please take a look. As for the rest , I will update it accordingly.

raise IOError("Supplied HDF5 File %s does not exists" % file_path)
if not h5_file:
raise ValueError("h5file Parameter can`t be None")
with pd.HDFStore(file_path, 'r') as data:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least two of these checks are unnecessary, since when you reach the pd.HDFStore() call, it would also raise an exception itself.

@vg3095
Copy link
Contributor Author

vg3095 commented Mar 6, 2017

@ftsamis
I have made some structural changes as suggested by you.
Reposting the example usage from above , with a minor change.

Example Usage (Renamed from_hdf5 -> from_hdf) :

from tardis.simulation import Simulation
model = Simulation.from_hdf('/path/to/model_output.hdf')
print model.time_explosion
print model.v_boundary_inner
print model.v_boundary_outer

As of now , it cannot return Simulation object , as convergence_strategy cannot
be set to None , at the time of initialization.

Changes:

  • Removed read_model_data and read_plasma_data function
  • Added from_hdf method in Radial1DModel class
  • Renamed function from_hdf5 to from_hdf (in Simulation Class)
  • Removed some unnecessary checks

@vg3095 vg3095 closed this Jun 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants