-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring the way data is returned in panedr #33
Changes from 4 commits
c484bef
48085f3
e700706
18e68c9
967807e
d69ae9e
09b8e9e
3b708d1
20d5e39
52857ad
ce812c5
692ccad
6ef2f84
2659211
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -45,7 +45,7 @@ | |
import sys | ||
import itertools | ||
import time | ||
import pandas | ||
|
||
|
||
#Index for the IDs of additional blocks in the energy file. | ||
#Blocks can be added without sacrificing backward and forward | ||
|
@@ -75,7 +75,7 @@ | |
Enxnm = collections.namedtuple('Enxnm', 'name unit') | ||
ENX_VERSION = 5 | ||
|
||
__all__ = ['edr_to_df'] | ||
__all__ = ['edr_to_df', 'edr_to_dict'] | ||
|
||
|
||
class EDRFile(object): | ||
|
@@ -395,21 +395,21 @@ def edr_strings(data, file_version, n): | |
|
||
def is_frame_magic(data): | ||
"""Unpacks an int and checks whether it matches the EDR frame magic number | ||
|
||
Does not roll the reading position back. | ||
""" | ||
magic = data.unpack_int() | ||
return magic == -7777777 | ||
|
||
|
||
def edr_to_df(path, verbose=False): | ||
def read_edr(path, verbose_set=False): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why the change from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I felt weird about |
||
begin = time.time() | ||
edr_file = EDRFile(str(path)) | ||
all_energies = [] | ||
all_names = [u'Time'] + [nm.name for nm in edr_file.nms] | ||
times = [] | ||
for ifr, frame in enumerate(edr_file): | ||
if verbose: | ||
if verbose_set: | ||
if ((ifr < 20 or ifr % 10 == 0) and | ||
(ifr < 200 or ifr % 100 == 0) and | ||
(ifr < 2000 or ifr % 1000 == 0)): | ||
|
@@ -421,11 +421,28 @@ def edr_to_df(path, verbose=False): | |
all_energies.append([frame.t] + [ener.e for ener in frame.ener]) | ||
|
||
end = time.time() | ||
if verbose: | ||
if verbose_set: | ||
print('\rLast Frame read : {}, time : {} ps' | ||
.format(ifr, frame.t), | ||
end='', file=sys.stderr) | ||
print('\n{} frame read in {:.2f} seconds'.format(ifr, end - begin), | ||
file=sys.stderr) | ||
|
||
return all_energies, all_names, times | ||
|
||
|
||
def edr_to_df(path: str, verbose: bool = False): | ||
import pandas | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is pandas now an optional dependency in requirements.txt etc? If so I would guard this with a try:
import pandas
except ImportError:
raise ImportError("""ERROR --- pandas was not found!
pandas is required to use the `.edr_to_df()` functionality.
try installing it using pip eg:
pip install pandas """) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. I'll make a note to add a test for raising this error as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wasn't sure what the best way to make pandas optional is. I have now done this by removing pandas from requirements.txt and adding a section under [extras] in setup.cfg. panedr can now be installed with pandas by running
Please let me know if this is not the best way to do these things. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would rather do things a little bit differently. The way you do it here will break the user experience for the standalone case. A user who do not use mdanalysis will install panedr and won't have pandas to use the main function out of the box. Instead, I would create 2 packages: the default one that depends on pandas and a "lite" one for downstream integrators who want to minimise dependencies. I don't know how to do that, though... |
||
all_energies, all_names, times = read_edr(path, verbose_set=verbose) | ||
df = pandas.DataFrame(all_energies, columns=all_names, index=times) | ||
return df | ||
|
||
|
||
def edr_to_dict(path: str, verbose: bool = False): | ||
import numpy as np | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this an optional dependency? I think its probably safe to make Thoughts @jbarnoud? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is reasonable to make numpy a compulsory dependency. The current users have to install it already because of pandas anyway; the new users will likely use it as well. |
||
all_energies, all_names, times = read_edr(path, verbose_set=verbose) | ||
energy_dict = {} | ||
for idx, name in enumerate(all_names): | ||
energy_dict[name] = np.array( | ||
[all_energies[frame][idx] for frame in range(len(times))]) | ||
return energy_dict | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Make sure that the "Time" key is in. I expect it to be, but I do not remember exactly how I treated it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Time" is part of all_names = [u'Time'] + [nm.name for nm in edr_file.nms]
[...]
all_energies.append([frame.t] + [ener.e for ener in frame.ener]) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -163,6 +163,14 @@ def _assert_progress_range(self, progress, dt, start, stop, step): | |
assert ref_line == progress_line | ||
|
||
|
||
def test_edr_to_dict(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This only really tests that it returns the same as |
||
array_dict = panedr.edr_to_dict(EDR) | ||
ref_df = panedr.edr_to_df(EDR) | ||
array_df = pandas.DataFrame.from_dict(array_dict).set_index( | ||
"Time", drop=False) | ||
assert array_df.equals(ref_df) | ||
|
||
|
||
def read_xvg(path): | ||
""" | ||
Reads XVG file, returning the data, names, and precision. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason not to add
read_edr
here as well.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw
read_edr
as merely providing data for the two user-exposed functionsedr_to_df
andedr_to_dict
, so that the user should never need to callread_edr
itself directly. I am not sure if the return values of this function are of use to a user, but yeah, that's not really a reason to add it here.