Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multiqc #14

Merged
merged 33 commits into from
Oct 31, 2023
Merged

Add multiqc #14

merged 33 commits into from
Oct 31, 2023

Conversation

rroutsong
Copy link
Collaborator

Add multiqc into ngsqc pipeline.

Addresses #12 #7

@rroutsong
Copy link
Collaborator Author

@skchronicles , example report at /data/RTB_GRS/dev/Dmux/test_ngsqc2/GRS_0212_Bhasym/230907_NS500353_0215_AHLTNVBGXM/multiqc/Run-230907_NS500353_0215_AHLTNVBGXM-Project-GRS_0212_Bhasym_multiqc_report.html

@@ -42,7 +41,6 @@ rule fastq_screen:
subset = 1000000,
aligner = "bowtie2",
output_dir = lambda w: config['out_to'] + "/" + w.project + "/" + config['run_ids'] + "/" + w.sid + "/fastq_screen/",
# container: "docker://rroutsong/dmux_ngsqc:0.0.1",
containerized: "/data/OpenOmics/SIFs/dmux_ngsqc_0.0.1.sif"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add an option to point to a sif cache and dynamically resolve one of the following: a local SIF on the file-system or a URI to pull an image from Dockerhub.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a solution to this issue in the next coming PR. I have serialized the server-centric SIF directories and dynamically adding the specific server configuration at initialization time.

Ends up like:

containerized: server_config["sif"] + "dmux_ngsqc_0.0.1.sif"

SIF cache is always specified at execution time through environmental variables and subprocess.

bin/dmux.py Show resolved Hide resolved
@@ -98,7 +100,6 @@ rule kraken_annotation:
kraken_log = config['out_to'] + "/{project}/" + config['run_ids'] + "/{sid}/kraken/{sid}.log",
params:
kraken_db = "/data/OpenOmics/references/Dmux/kraken2/k2_pluspfp_20230605"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a method to dynamically resolve the reference files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also addressed this in the next PR. I just kind of saved all the server resolution methods until I moved onto bigsky.

log: config['out_to'] + "/.logs/" + config['projects'] + "/" + config['run_ids'] + "/multiqc/multiqc.log"
shell:
"""
multiqc -q -ip \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, we may want to point to a MutliQC config file to clean up the general statistics table, create two sections for fastqc, and create a preferred module order in the final report.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outlined in #15

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to change how adapter sequences are being removed. Currently, there is a bug where the barcode sequences from Illumina's sample sheet (i7/i5) sequences are being passed to fastqc and fastp. These barcode sequences should be removed after bcl2fastq step and do not represent traditional library-prep-kit-specific adapter sequences that need to removed. With that being said, let's make use of fastp's auto-detect-adapter-sequences feature to remove them. We can also make use of fastqc's internal contaminates/adapters list to identify sequencing adapters.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's fastp rule in new branch master_job_and_bigsky:

    shell:
        """
        fastp \
        --detect_adapter_for_pe \
        --in1 {input.in_read1} --in2 {input.in_read2} \
        --out1 {output.out_read1} \
        --out2 {output.out_read2} \
        --html {output.html} \
        --json {output.json} \
        """

Fastqc:

    shell:
        """
        mkdir -p {params.output_dir}
        fastqc -o {params.output_dir} -t {threads} {input.samples}
        """

FastQC before trim depends on demuxed reads, after trimmed depends on trimmed reads file.

@skchronicles
Copy link
Contributor

Will address some of these comments/issues in the next PR.

@skchronicles skchronicles merged commit 2ff4353 into main Oct 31, 2023
@rroutsong rroutsong deleted the add_multiqc branch November 17, 2023 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants