-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add crabs/dbimport from readsimulator pipeline #6584
base: master
Are you sure you want to change the base?
Conversation
CRABS has changed a lot of its functionality when updating to version 1.0.0. This needs to be taken care of! |
Unfortunately I get java.lang.OutOfMemoryError: Required array size too large when I try to use the downloadtaxonomy module (see PR #7423). I asked if there is a way to only download a fraction of the data here: gjeunen/reference_database_creator#83 We can then either do that or try to downsample the data, load it into test_datasets and continue from there. In any case, the downloadtaxonomy module is needed to properly run crabs. |
There is downsampled test data available in the test-datasets repository. Unfortunately I run into the following error: │ Command executed: │
│ │
│ if [ "false" == "true" ]; then │
│ gzip -c -d genome.fasta > genome.fasta │
│ fi │
│ │
│ crabs --import \ │
│ --input genome.fasta \ │
│ --output test.crabsdb.fa \ │
│ --acc2tax nucl_gb.accession2taxid \ │
│ --names names.dmp \ │
│ --nodes nodes.dmp \ │
│ --import-format embl --ranks 'superkingdom;phylum;class;order;family;genus;species' \ │
│ │
│ rm genome.fasta │
│ │
│ cat <<-END_VERSIONS > versions.yml │
│ "CRABS_DBIMPORT": │
│ crabs: $(crabs --help | grep 'CRABS |' | sed 's/.*CRABS | \(v[0-9.]*\).*/\1/') │
│ END_VERSIONS │
│ │
│ Command exit status: │
│ 1 │
│ │
│ Command output: │
│ | Read data to memory | 0% -:--:-- 0:00:00 │
│ │
│ Command error: │
│ /usr/local/lib/python3.12/site-packages/function/crabs_functions.py:775: SyntaxWarning: invalid escape sequence '\.' │
│ for item in ['_sp\.','_SP\.','_indet.', '_sp.', '_SP.']: │
│ /usr/local/lib/python3.12/site-packages/function/crabs_functions.py:775: SyntaxWarning: invalid escape sequence '\.' │
│ for item in ['_sp\.','_SP\.','_indet.', '_sp.', '_SP.']: │
│ Matplotlib created a temporary cache directory at /tmp/matplotlib-am3lbtwt because the default path (/.config/matplotlib) is not a writable directory; it is │
│ highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support │
│ multiprocessing. │
│ │
│ /// CRABS | v1.0.7 │
│ │
│ | Function | Import sequence data into CRABS format │
│ | Read data to memory | 0% -:--:-- 0:00:00 │
│ Traceback (most recent call last): │
│ File "/usr/local/bin/crabs", line 847, in <module> │
│ crabs() │
│ File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1157, in __call__ │
│ return self.main(*args, **kwargs) │
│ ^^^^^^^^^^^^^^^^^^^^^^^^^^ │
│ File "/usr/local/lib/python3.12/site-packages/rich_click/rich_command.py", line 152, in main │
│ rv = self.invoke(ctx) │
│ ^^^^^^^^^^^^^^^^ │
│ File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1434, in invoke │
│ return ctx.invoke(self.callback, **ctx.params) │
│ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ │
│ File "/usr/local/lib/python3.12/site-packages/click/core.py", line 783, in invoke │
│ return __callback(*args, **kwargs) │
│ ^^^^^^^^^^^^^^^^^^^^^^^^^^^ │
│ File "/usr/local/bin/crabs", line 561, in crabs │
│ seq_input_dict, initial_seq_number = input_to_memory(task, progress_bar, input_) │
│ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ │
│ File "/usr/local/lib/python3.12/site-packages/function/crabs_functions.py", line 393, in embl_to_memory │
│ seq_name = line.split('|')[1] │
│ ~~~~~~~~~~~~~~~^^^ │
│ IndexError: list index out of range |
PR checklist
Closes #5532
versions.yml
file.label
nf-core modules test <MODULE> --profile docker
nf-core modules test <MODULE> --profile singularity
nf-core modules test <MODULE> --profile conda
nf-core subworkflows test <SUBWORKFLOW> --profile docker
nf-core subworkflows test <SUBWORKFLOW> --profile singularity
nf-core subworkflows test <SUBWORKFLOW> --profile conda