`eclab.mpt`: Parsing of header is prone to crashes. #155

PeterKraus · 2024-05-22T18:40:34Z

The eclab.mpt header parser in:

Lines 70 to 144 in c6379f7

    
           def process_header( 
        
               lines: list[str], 
        
               timezone: str, 
        
               locale: str, 
        
           ) -> tuple[dict, list, dict]: 
        
               """Processes the header lines. 
        
               Parameters 
        
               ---------- 
        
               lines 
        
                   The header lines, starting at line 3 (which is an empty line), 
        
                   right after the `"Nb header lines : "` line. 
        
               Returns 
        
               ------- 
        
               tuple[dict, dict] 
        
                   A dictionary containing the settings (and the technique 
        
                   parameters) and a dictionary containing the loop indexes. 
        
               """ 
        
               sections = "\n".join(lines).split("\n\n") 
        
               # Can happen that no settings are present but just a loops section. 
        
               assert not sections[1].startswith("Number of loops : "), "no settings present" 
        
               # Again, we need the acquisition time to get timestamped data. 
        
               assert len(sections) >= 3, "no settings present" 
        
               technique = sections[1].strip() 
        
               settings_lines = sections[2].split("\n") 
        
               technique, params_keys = technique_params(technique, settings_lines) 
        
               params = settings_lines[-len(params_keys) :] 
        
               # The sequence param columns are always allocated 20 characters. 
        
               n_sequences = int(len(params[0]) / 20) 
        
               params_values = [] 
        
               for seq in range(1, n_sequences): 
        
                   values = [] 
        
                   for param in params: 
        
                       val = param[seq * 20 : (seq + 1) * 20] 
        
                       try: 
        
                           val = float(parse_decimal(val, locale=locale)) 
        
                       except ValueError: 
        
                           val = val.strip() 
        
                       values.append(val) 
        
                   params_values.append(values) 
        
               params = [dict(zip(params_keys, values)) for values in params_values] 
        
               settings_lines = [line.strip() for line in settings_lines[: -len(params_keys)]] 
        
               # Parse the acquisition timestamp. 
        
               timestamp_re = re.compile(r"Acquisition started on : (?P<val>.+)") 
        
               timestamp_match = timestamp_re.search("\n".join(settings_lines)) 
        
               timestamp = timestamp_match["val"] 
        
               for format in ("%m/%d/%Y %H:%M:%S", "%m.%d.%Y %H:%M:%S", "%m/%d/%Y %H:%M:%S.%f"): 
        
                   uts = dgutils.str_to_uts( 
        
                       timestamp=timestamp, format=format, timezone=timezone, strict=False 
        
                   ) 
        
                   if uts is not None: 
        
                       break 
        
               if uts is None: 
        
                   raise NotImplementedError(f"Time format for {timestamp} not implemented.") 
        
               loops = None 
        
               if len(sections) >= 4 and sections[-1].startswith("Number of loops : "): 
        
                   # The header contains a loops section. 
        
                   loops_lines = sections[-1].split("\n") 
        
                   n_loops = int(loops_lines[0].split(":")[-1]) 
        
                   indexes = [] 
        
                   for n in range(n_loops): 
        
                       index = loops_lines[n + 1].split("to")[0].split()[-1] 
        
                       indexes.append(int(index)) 
        
                   loops = {"n_loops": n_loops, "indexes": indexes} 
        
               settings = { 
        
                   "posix_timestamp": uts, 
        
                   "technique": technique, 
        
                   "raw": "\n".join(lines), 
        
               } 
        
               return settings, params, loops

has seen better days and is very quickly becoming hard to maintain.

The big issue here is that the params section (i.e. everything after Cycle Definition until a blank line) can have a variable number of lines even for the same technique. This is particularly frustrating as it can happen when mpt files are generated from the same mpr file, depending on the EC-Lab version used to export the file. Note that only the EC-Lab version used to create the file is recorded in the mpt file, not the export version.

Fixing this would pave way towards resolving #149 and is a good opportunity to close #12.

The text was updated successfully, but these errors were encountered:

PeterKraus mentioned this issue May 22, 2024

eclab: tests and minor fixes #154

Merged

PeterKraus added this to the 5.1 milestone May 22, 2024

PeterKraus mentioned this issue May 23, 2024

eclab.mpt: Rework header parse #157

Merged

PeterKraus closed this as completed in #157 May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`eclab.mpt`: Parsing of header is prone to crashes. #155

`eclab.mpt`: Parsing of header is prone to crashes. #155

PeterKraus commented May 22, 2024 •

edited

Loading

eclab.mpt: Parsing of header is prone to crashes. #155

eclab.mpt: Parsing of header is prone to crashes. #155

Comments

PeterKraus commented May 22, 2024 • edited Loading

`eclab.mpt`: Parsing of header is prone to crashes. #155

`eclab.mpt`: Parsing of header is prone to crashes. #155

PeterKraus commented May 22, 2024 •

edited

Loading