-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use Custom input directory? #349
Comments
Hi, thanks for reaching out and using BGCFlow :) I just wish to know how to use Custom input directoryTo use custom input directory, you will need to set up two things:
Please find the example here: config.zip This is how the project structure will look like: config/
├── Lactobacillus_delbrueckii
│ ├── input_files # your custom input directory
│ │ └── my_custom_genome.gbk
│ ├── gtdbtk.bac120.summary.tsv # an optional GTDB-tk style taxonomic assignment
│ ├── project_config.yaml # project level configuration, the rule set here will override the global parameter
│ └── samples.csv
└──── config.yaml # global parameter configuration And this is how the project configuration ( name: Lactobacillus_delbrueckii_custom_input
pep_version: 2.1.0
description: "An example of using custom input files in BGCFlow projects."
input_folder: input_files # This is the folder where the input files are located, relative to this file.
input_type: gbk # This is the default type of input files. It can be gbk or fna. Note that samples from NCBI will default to fna format.
gtdb-tax: gtdbtk.bac120.summary.tsv # you can also provide a custom GTDB-tk output style taxonomy information
sample_table: samples.csv
#### RULE CONFIGURATION ####
# rules: set value to TRUE if you want to run the analysis or FALSE if you don't
rules:
seqfu: TRUE
... and finally, you need to add the custom sample in the
and you should get this message: Step 2.1 Getting sample information from: config/Lactobacillus_delbrueckii/project_config.yaml
- Processing project [config/Lactobacillus_delbrueckii/project_config.yaml]
- Custom input directory: True
- Getting input files from: /datadrive/bgcflow/config/Lactobacillus_delbrueckii/input_files
- Custom input format: True
- Default input file type: gbk
- ! WARNING: GCA_000056065.1 is from ncbi. Enforcing format to `fna`.
- ! WARNING: GCA_000182835.1 is from ncbi. Enforcing format to `fna`.
- ! WARNING: GCA_000191165.1 is from ncbi. Enforcing format to `fna`.
- ! WARNING: GCA_000014405.1 is from ncbi. Enforcing format to `fna`.
- Found user-provided taxonomic information Why this workflow can still running after I delete all the input files?I would assume that you don't change anything in the example template configuration and only deleted the default input files located in Thank you again for the question, we will be sure to add this to the FAQ section and improve the WIKI. |
Hi WJ, Glad to hear it works :) The CLI for If you prefer to use the $ bgcflow run --help
Usage: bgcflow run [OPTIONS]
A snakemake CLI wrapper to run BGCFlow. Automatically run panoptes.
Options:
-d, --bgcflow_dir TEXT Location of BGCFlow directory. (DEFAULT: Current
working directory.)
--workflow TEXT Select which snakefile to run. Available
subworkflows: {BGC | Database | Report | Metabase |
lsagbc | ppanggolin}. (DEFAULT: workflow/Snakefile)
--monitor-off Turn off Panoptes monitoring workflow. (DEFAULT:
False)
--wms-monitor TEXT Panoptes address. (DEFAULT: http://127.0.0.1:5000)
-c, --cores INTEGER Use at most N CPU cores/jobs in parallel. (DEFAULT:
8)
-n, --dryrun Test run.
--unlock Remove a lock on the snakemake working directory.
--until TEXT Runs the pipeline until it reaches the specified
rules or files.
--profile TEXT Path to a directory containing snakemake profile.
-t, --touch Touch output files (mark them up to date without
really changing them).
-h, --help Show this message and exit. Note that the current
I hope this means the command By You can actually reuse the conda environment built by snakemake by checking the snakemake log. They can be found in the PS: If you find any misleading or wrong instruction in the WIKI, please do let us know to correct it. |
Hi, Thanks very much for the help. I got another problem when the workflow install the env for roary. shows below:
LibMambaUnsatisfiableError: Encountered problems while solving:
I can use roary in individual conda env , but when I export the yaml to replace the yaml in the workflow it will generate new errors. So may I ask how to modify the yaml to solve this? Thanks for the help! Best Regards, |
Hi Jay, Unfortunately I cannot reproduce the error for creating the roary environment and the test seems to work fine. From the message: Can you check your conda channel priorities and set it to After setting the priority to flexible, while running the snakemake jobs, you should see this Assuming unrestricted shared filesystem usage.
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Creating conda environment workflow/envs/roary.yaml...
Downloading and installing remote packages.
Environment for /datadrive_cemist/test/workflow/rules/../envs/roary.yaml created (location: .snakemake/conda/b39a961a250810ddef5ab2698703b6ab_)
Using shell: /usr/bin/bash
Provided cores: 4
Rules claiming more threads will be scaled down.
Singularity containers: ignored We will probably remove roary with other newer pangenome builder. Hopefully there will be support to use singularity containers in the future for better reproducibility. |
Hi Matinnuhamunada, Thanks for your help. I fixed the roary under your guidance. But there is anther problem. After generate the figure of automlst tree-roary matrix, the whole workflow will interrupt work like no error, no crush, just stop work there. I want to find what happened but the log showed nothing errors. Also the automlst tree figures show all nan instead of the names of strains. Thanks very much for the help. Best Regards, |
Hi Jay, I'm currently on summer vacation, so forgive me in advance if I can't reply to your issues swiftly. We do encounter some issues with Roary, and there are future plans to replace it with newer alternatives. Building pangenome can be tricky, as it depends on the sample set that is given. If the genomes are complex (say Streptomyces)
Can you elaborate more on which step does this error happen? If you can provide the log file here, it will be great :)
It's true that I haven't put to much effort on pangenome visualization as there is another project going at our center (see https://pankb.org/). I don't think I can work too much on this as I need to finish my PhD in August 😬. But maybe @JackSun1997 can help or give suggestion on how to process BGCFlow roary output for further visualization? |
Hi Matinnuhamunada, Congratulations on your PhD! I attached the log below, the 60lines before it stopped. The workflow will stop after that till I terminate it next morning. Thanks for your help! [Tue Jul 16 20:15:45 2024] Activating conda environment: .snakemake/conda/99b85a6c79ba929c0f380ce8472bd644_ [Tue Jul 16 20:30:31 2024] [Tue Jul 16 20:30:31 2024] Activating conda environment: .snakemake/conda/fdc4fd4e3e5776a2b25e63e13fba7ea5_ [Tue Jul 16 20:31:28 2024] Activating conda environment: .snakemake/conda/fdc4fd4e3e5776a2b25e63e13fba7ea5_
|
No description provided.
The text was updated successfully, but these errors were encountered: