Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input files having >1 extra empty rows at the end breaks CSVReader #1023

Open
astjoephysics opened this issue Sep 26, 2024 · 2 comments
Open

Comments

@astjoephysics
Copy link
Collaborator

If an input file has an 2 or more blank lines after data rows CSVReader breaks and can't parse the extra skiprows function. Possibly need to make users aware to format input files correctly?

@astronomerritt
Copy link
Collaborator

Saving the traceback here:

Traceback (most recent call last):
  File "/opt/miniconda3/envs/sorcha/bin/sorcha-run", line 8, in <module>
    sys.exit(main())
  File "/Users/stephaniemerritt/Projects/sorcha/src/sorcha_cmdline/run.py", line 121, in main
    return execute(args)
  File "/Users/stephaniemerritt/Projects/sorcha/src/sorcha_cmdline/run.py", line 183, in execute
    runLSSTSimulation(args, configs)
  File "/Users/stephaniemerritt/Projects/sorcha/src/sorcha/sorcha.py", line 190, in runLSSTSimulation
    orbits_df = reader.read_aux_block(block_size=configs["size_serial_chunk"])
  File "/Users/stephaniemerritt/Projects/sorcha/src/sorcha/readers/CombinedDataReader.py", line 241, in read_aux_block
    current_df = reader.read_objects(obj_ids)
  File "/Users/stephaniemerritt/Projects/sorcha/src/sorcha/readers/ObjectDataReader.py", line 125, in read_objects
    res_df = self._read_objects_internal(obj_ids, **kwargs)
  File "/Users/stephaniemerritt/Projects/sorcha/src/sorcha/readers/CSVReader.py", line 229, in _read_objects_internal
    res_df = pd.read_csv(
  File "/opt/miniconda3/envs/sorcha/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/opt/miniconda3/envs/sorcha/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 626, in _read
    return parser.read(nrows)
  File "/opt/miniconda3/envs/sorcha/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1923, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
  File "/opt/miniconda3/envs/sorcha/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
  File "parsers.pyx", line 838, in pandas._libs.parsers.TextReader.read_low_memory
  File "parsers.pyx", line 905, in pandas._libs.parsers.TextReader._read_rows
  File "parsers.pyx", line 874, in pandas._libs.parsers.TextReader._tokenize_rows
  File "parsers.pyx", line 891, in pandas._libs.parsers.TextReader._check_tokenize_status
  File "parsers.pyx", line 2050, in pandas._libs.parsers.raise_parser_error
IndexError: list index out of range
Error: Command 'sorcha-run' failed with exit code 1.

@astronomerritt
Copy link
Collaborator

This is a bit of a pain, because the code is breaking BEFORE it does the validation checks on the input table. See lines 125-126 of ObjectDataReader.py:

res_df = self._read_objects_internal(obj_ids, **kwargs)  # code breaks here
res_df = self._process_and_validate_input_table(res_df, **kwargs)

My only suggestion for fixing this is to wrap one of the failing lines in a try/except IndexError statement and throw an error message that suggests the user might want to check for empty lines at the end of their input files.

I'd suggest doing this to the line I highlighted above, as it's in the parent class so all sub-classes would inherit the behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants