Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eclab.mpt: Parsing of header is prone to crashes. #155

Closed
PeterKraus opened this issue May 22, 2024 · 0 comments · Fixed by #157
Closed

eclab.mpt: Parsing of header is prone to crashes. #155

PeterKraus opened this issue May 22, 2024 · 0 comments · Fixed by #157
Milestone

Comments

@PeterKraus
Copy link
Contributor

PeterKraus commented May 22, 2024

The eclab.mpt header parser in:

def process_header(
lines: list[str],
timezone: str,
locale: str,
) -> tuple[dict, list, dict]:
"""Processes the header lines.
Parameters
----------
lines
The header lines, starting at line 3 (which is an empty line),
right after the `"Nb header lines : "` line.
Returns
-------
tuple[dict, dict]
A dictionary containing the settings (and the technique
parameters) and a dictionary containing the loop indexes.
"""
sections = "\n".join(lines).split("\n\n")
# Can happen that no settings are present but just a loops section.
assert not sections[1].startswith("Number of loops : "), "no settings present"
# Again, we need the acquisition time to get timestamped data.
assert len(sections) >= 3, "no settings present"
technique = sections[1].strip()
settings_lines = sections[2].split("\n")
technique, params_keys = technique_params(technique, settings_lines)
params = settings_lines[-len(params_keys) :]
# The sequence param columns are always allocated 20 characters.
n_sequences = int(len(params[0]) / 20)
params_values = []
for seq in range(1, n_sequences):
values = []
for param in params:
val = param[seq * 20 : (seq + 1) * 20]
try:
val = float(parse_decimal(val, locale=locale))
except ValueError:
val = val.strip()
values.append(val)
params_values.append(values)
params = [dict(zip(params_keys, values)) for values in params_values]
settings_lines = [line.strip() for line in settings_lines[: -len(params_keys)]]
# Parse the acquisition timestamp.
timestamp_re = re.compile(r"Acquisition started on : (?P<val>.+)")
timestamp_match = timestamp_re.search("\n".join(settings_lines))
timestamp = timestamp_match["val"]
for format in ("%m/%d/%Y %H:%M:%S", "%m.%d.%Y %H:%M:%S", "%m/%d/%Y %H:%M:%S.%f"):
uts = dgutils.str_to_uts(
timestamp=timestamp, format=format, timezone=timezone, strict=False
)
if uts is not None:
break
if uts is None:
raise NotImplementedError(f"Time format for {timestamp} not implemented.")
loops = None
if len(sections) >= 4 and sections[-1].startswith("Number of loops : "):
# The header contains a loops section.
loops_lines = sections[-1].split("\n")
n_loops = int(loops_lines[0].split(":")[-1])
indexes = []
for n in range(n_loops):
index = loops_lines[n + 1].split("to")[0].split()[-1]
indexes.append(int(index))
loops = {"n_loops": n_loops, "indexes": indexes}
settings = {
"posix_timestamp": uts,
"technique": technique,
"raw": "\n".join(lines),
}
return settings, params, loops

has seen better days and is very quickly becoming hard to maintain.

The big issue here is that the params section (i.e. everything after Cycle Definition until a blank line) can have a variable number of lines even for the same technique. This is particularly frustrating as it can happen when mpt files are generated from the same mpr file, depending on the EC-Lab version used to export the file. Note that only the EC-Lab version used to create the file is recorded in the mpt file, not the export version.

Fixing this would pave way towards resolving #149 and is a good opportunity to close #12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant