Skip to content

medvir/NanoporeMet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NanoporeMet

The goal of this repository is to contain the scripts to analyze (nanoporemet.py) and visualize (app.R, coverage.py) metagenomic sequencing data generated by Oxford Nanopore Technologies sequencing devices. Both viral and bacterial analyses are possible.

nanoporemet.py

nanoporemet.py analyzes metagenomic sequencing reads with kraken2. As to whether only viral or also bacterial analysis should be performed can be decided through the selection of the kraken2 database.

nanoporemet.py first concatenates all .fastq.gz files of each barcode within /fastq_pass, then runs kraken2 on all of them individually, and finally combines all kraken2 output files (i.e. from each barcode) into one file, either virus.kraken.txt or virus_bacteria.kraken.txt (depending on the selected database). If nanoporemet.py is run after the sequencing run has finished and the sequencing_summary_*.txt file is available, a sequencing_summary.pdf file is created which plots histograms of the mean Q scores and read lengths of all reads as well as reads passing the quality filter.

How to run

  1. Enter timavo.

ssh timavo

  1. Activate kraken2.

conda activate kraken

  1. Move into the sequencing output directory, i.e., the one where you find, e.g., the fastq_pass subdirectory, or the sequencing_summary_*.txt file at the end of the sequencing run.

cd /data/GridION/GridIONOutput/<experiment>/<sample>/<flowcell>/

  1. Run the python script.

python <path to script>/nanoporemet.py

  1. The script asks you whether you want to analyze bacterial reads (in addition to only viral reads).

Reply with either yes/y or no/n.

Input

Metagenomic sequencing data

Within the sequencing output directory, the script looks for the /fastq_pass subdirectory and analyzes all .fastq.gz files.

kraken2 databases

nanoporemet.py uses one of two kraken2 databases to analyze the reads. The paths to these databases are to be found within the script and can easily be adjusted. The current databases are as follows:

  • viral database: k2_human-viral_20240111

  • viral + bacterial database: k2_human-viral_20240111

Run statistics

For the creation of the histogram plots, the script looks for sequencing_summary_*.txt within the sequencing output directory. If it is not available yet, this step is simply skipped.

Output

kraken2 analysis

The kraken2 report with the analysis of all barcodes is saved in the sequencing output directory. Depending on the selection of the kraken database, the report is saved as virus.kraken.txt or virus_bacteria.kraken.txt.

Run statistics

The histogram plots of the mean Q scores and read lengths of all reads as well as the reads passing the quality filter are all saved in sequencing_summary.pdf, which is also found within the sequencing output directory.

Shiny app

The app.R script is a Shiny app which serves to visualize the kraken2 report as generated by nanoporemet.py. Simply upload virus.kraken.txt or virus_bacteria.kraken.txt to the app, select a barcode and choose whether you want to analyze viral or bacterial reads, on either species or genus level. Endogenous retroviruses and phages as well as blocklisted viruses can be hidden from the output (the blocklist can be updated within app.R).

The Shiny app shows the taxonomic distribution of the reads in a barplot as well as a list with all found virus or bacterial species or genera within the sample (per barcode).

coverage.py

The coverage.py automates coverage plot generation for Oxford Nanopore Technologies reads. First, it concatenates all reads within /fastq_pass and then maps those reads to a desired reference sequence (indexed .fasta file) using minimap2.

How to run

  1. Enter timavo.

ssh timavo

  1. Activate minimap2.

conda activate minimap2

  1. Move into the sequencing output directory, i.e., the one where you find, e.g., the fastq_pass subdirectory.

cd /data/GridION/GridIONOutput/<experiment>/<sample>/<flowcell>/

  1. Run the python script.

python <path to script>/coverage.py

  1. You will be asked to enter the path to the indexed reference sequence.

Input

Metagenomic sequencing data

Within the sequencing output directory, the script looks for the /fastq_pass subdirectory and analyzes all .fastq.gz files.

Reference sequence

The path to the reference sequence is provided by the user upon running the script. Make sure the reference sequence is indexed and stored in

/analyses/ONT_analyses/bwa/references/<virus/bacteria>/<name>/.

To index the reference .fasta file, move into /analyses/ONT_analyses/bwa/ and run:

./bwa index ./references/<virus/bacteria>/<name>/*.fasta.

Output

Coverage plot

Within the sequencing output directory, you will find a new subdirectory with the name of the reference sequence. Next to the coverage plot (PDF), it also contains the .sam, .bam, and .coverage files.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published