Input files having >1 extra empty rows at the end breaks CSVReader #1023

astjoephysics · 2024-09-26T14:07:12Z

If an input file has an 2 or more blank lines after data rows CSVReader breaks and can't parse the extra skiprows function. Possibly need to make users aware to format input files correctly?

astronomerritt · 2024-10-07T13:26:35Z

Saving the traceback here:

Traceback (most recent call last):
  File "/opt/miniconda3/envs/sorcha/bin/sorcha-run", line 8, in <module>
    sys.exit(main())
  File "/Users/stephaniemerritt/Projects/sorcha/src/sorcha_cmdline/run.py", line 121, in main
    return execute(args)
  File "/Users/stephaniemerritt/Projects/sorcha/src/sorcha_cmdline/run.py", line 183, in execute
    runLSSTSimulation(args, configs)
  File "/Users/stephaniemerritt/Projects/sorcha/src/sorcha/sorcha.py", line 190, in runLSSTSimulation
    orbits_df = reader.read_aux_block(block_size=configs["size_serial_chunk"])
  File "/Users/stephaniemerritt/Projects/sorcha/src/sorcha/readers/CombinedDataReader.py", line 241, in read_aux_block
    current_df = reader.read_objects(obj_ids)
  File "/Users/stephaniemerritt/Projects/sorcha/src/sorcha/readers/ObjectDataReader.py", line 125, in read_objects
    res_df = self._read_objects_internal(obj_ids, **kwargs)
  File "/Users/stephaniemerritt/Projects/sorcha/src/sorcha/readers/CSVReader.py", line 229, in _read_objects_internal
    res_df = pd.read_csv(
  File "/opt/miniconda3/envs/sorcha/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/opt/miniconda3/envs/sorcha/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 626, in _read
    return parser.read(nrows)
  File "/opt/miniconda3/envs/sorcha/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1923, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
  File "/opt/miniconda3/envs/sorcha/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
  File "parsers.pyx", line 838, in pandas._libs.parsers.TextReader.read_low_memory
  File "parsers.pyx", line 905, in pandas._libs.parsers.TextReader._read_rows
  File "parsers.pyx", line 874, in pandas._libs.parsers.TextReader._tokenize_rows
  File "parsers.pyx", line 891, in pandas._libs.parsers.TextReader._check_tokenize_status
  File "parsers.pyx", line 2050, in pandas._libs.parsers.raise_parser_error
IndexError: list index out of range
Error: Command 'sorcha-run' failed with exit code 1.

astronomerritt · 2024-10-07T13:34:16Z

This is a bit of a pain, because the code is breaking BEFORE it does the validation checks on the input table. See lines 125-126 of ObjectDataReader.py:

res_df = self._read_objects_internal(obj_ids, **kwargs)  # code breaks here
res_df = self._process_and_validate_input_table(res_df, **kwargs)

My only suggestion for fixing this is to wrap one of the failing lines in a try/except IndexError statement and throw an error message that suggests the user might want to check for empty lines at the end of their input files.

I'd suggest doing this to the line I highlighted above, as it's in the parent class so all sub-classes would inherit the behaviour.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input files having >1 extra empty rows at the end breaks CSVReader #1023

Input files having >1 extra empty rows at the end breaks CSVReader #1023

astjoephysics commented Sep 26, 2024

astronomerritt commented Oct 7, 2024

astronomerritt commented Oct 7, 2024

Input files having >1 extra empty rows at the end breaks CSVReader #1023

Input files having >1 extra empty rows at the end breaks CSVReader #1023

Comments

astjoephysics commented Sep 26, 2024

astronomerritt commented Oct 7, 2024

astronomerritt commented Oct 7, 2024