Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All 3 examples in tutorial do not work, for different reasons #8

Open
brian-arnold opened this issue Nov 7, 2023 · 0 comments
Open

Comments

@brian-arnold
Copy link

Hello! As a first test of whether your software works, I went through each example in your tutorial and each failed for a different reason, one of which is the same error previously reported in another issue. These could be due to errors unique to the data processing in the tutorial examples or to software issues, but I didn't probe further. Have you run these examples on your end? I copied and pasted your commands from the tutorial and double checked that everything was right, but I suppose I could have missed something.

Errors:

In example 1, I also get the same error previously posted during train test split:

File "../../scripts/parsers/fasta2explainn.py", line 147, in _to_ExplaiNN
df2 = pd.DataFrame(data, columns=list(range(len(data[0]))))
IndexError: list index out of range

In example 2, during the step to subsample 100k sequences:

File "/Users/bjarnold/Princeton_EEB/Kocher/test/ExplaiNN/scripts/utils/subsample-seqs-by-gc.py", line 95, in _subsample_seqs_by_GC
norm_factor = subsample / sum([len(v) for v in gc_regroups.values()])
ZeroDivisionError: division by zero

In example 3, during model training:

File "/Users/bjarnold/miniforge3/envs/explainn/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
chunks = self._reader.read_low_memory(nrows)
File "parsers.pyx", line 843, in pandas._libs.parsers.TextReader.read_low_memory
File "parsers.pyx", line 904, in pandas._libs.parsers.TextReader._read_rows
File "parsers.pyx", line 879, in pandas._libs.parsers.TextReader._tokenize_rows
File "parsers.pyx", line 890, in pandas._libs.parsers.TextReader._check_tokenize_status
File "parsers.pyx", line 2058, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 4

On top of these errors, there are several discrepancies between the tutorial PDF you uploaded and the slides you make available on google docs, including some slides completely missing (e.g. for example 3) or typos in commands (where the input and output file for a script are the same).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant