Skip to content

Commit

Permalink
version update for LRSDAY: v1.0.0 -> v1.1.0
Browse files Browse the repository at this point in the history
  • Loading branch information
yjx1217 committed Jul 11, 2018
1 parent 09d78f3 commit 819b84e
Show file tree
Hide file tree
Showing 39 changed files with 26,249 additions and 23,927 deletions.
31 changes: 31 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [1.1.0] - 2018-07-11
### Added
- This change log file: Changelog.md
- Support for customized parameter settings for Canu.
- Support for alternative assemblers: Flye and smartdenovo.
- The script subsampling_seqeunces.pl for downsampling input reads.
- Support for the report of exact numbers of SNP and INDEL corrections made based on the Illumina data.
### Changed
- Better resource management for the Illumina-based polishing step.
- Better robustness for the reference-based scaffolding step.
- Better performance and robustness for the mitochondrial genome assembly improvement step.
- Better robustness for the TE annotation step.
- Better robustness for the reference-based gene orthology identification step.
- Software version or downloading URL updates for a number of dependencies.
- Preparation for future incorporation of long-read-based polishing step.
### Fixed
- A bug in the script identify_contigs_for_RefChr_by_mummer.pl that ignores user-specified options.
- Typos in the manual.

## [1.0.0] - 2018-02-04
### Added
- Initial version of LRSDAY.
Binary file modified Example_Outputs/SK1.final.cds.fa.gz
Binary file not shown.
Binary file modified Example_Outputs/SK1.final.fa.gz
Binary file not shown.
Binary file modified Example_Outputs/SK1.final.filter.mummer2vcf.INDEL.vcf.gz
Binary file not shown.
Binary file modified Example_Outputs/SK1.final.filter.mummer2vcf.SNP.vcf.gz
Binary file not shown.
Binary file modified Example_Outputs/SK1.final.filter.pdf
Binary file not shown.
47,441 changes: 23,946 additions & 23,495 deletions Example_Outputs/SK1.final.gff3

Large diffs are not rendered by default.

360 changes: 189 additions & 171 deletions Example_Outputs/SK1.final.manual_check.list

Large diffs are not rendered by default.

Binary file modified Example_Outputs/SK1.final.pep.fa.gz
Binary file not shown.
30 changes: 15 additions & 15 deletions Example_Outputs/SK1.final.stats.txt
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
total sequence count: 20
total sequence length: 12261860
min sequence length: 24410
max sequence length: 1480198
mean sequence length: 613093.00
median sequence length: 637681.50
N50: 923665
total sequence count: 33
total sequence length: 12489291
min sequence length: 1248
max sequence length: 1480203
mean sequence length: 378463.36
median sequence length: 84648.00
N50: 923633
L50: 6
N90: 462493
L90: 13
A%: 30.91
T%: 30.81
G%: 19.12
C%: 19.12
AT%: 61.72
GC%: 38.24
N90: 341463
L90: 14
A%: 30.88
T%: 30.80
G%: 19.13
C%: 19.16
AT%: 61.67
GC%: 38.29
N%: 0.04
Binary file modified Example_Outputs/SK1.final.trimmed_cds.fa.gz
Binary file not shown.
33 changes: 16 additions & 17 deletions Example_Outputs/SK1.final.trimmed_cds.log
Original file line number Diff line number Diff line change
@@ -1,18 +1,17 @@
incorrect_length: SK1_G0000010|SK1_G0000010.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0000060|SK1_G0000060.mRNA.1, phase_guess: 2
incorrect_length: SK1_G0001750|SK1_G0001750.mRNA.1, phase_guess: 1
incorrect_length: SK1_G0008130|SK1_G0008130.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0010370|SK1_G0010370.mRNA.1, phase_guess: 1
incorrect_length: SK1_G0013740|SK1_G0013740.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0013750|SK1_G0013750.mRNA.1, phase_guess: 2
incorrect_length: SK1_G0013800|SK1_G0013800.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0017760|SK1_G0017760.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0032340|SK1_G0032340.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0034190|SK1_G0034190.mRNA.1, phase_guess: 1
incorrect_length: SK1_G0043660|SK1_G0043660.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0052490|SK1_G0052490.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0052500|SK1_G0052500.mRNA.1, phase_guess: 2
incorrect_length: SK1_G0057520|SK1_G0057520.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0057540|SK1_G0057540.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0057560|SK1_G0057560.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0057580|SK1_G0057580.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0008090|SK1_G0008090.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0010330|SK1_G0010330.mRNA.1, phase_guess: 1
incorrect_length: SK1_G0013700|SK1_G0013700.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0013710|SK1_G0013710.mRNA.1, phase_guess: 2
incorrect_length: SK1_G0017750|SK1_G0017750.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0032330|SK1_G0032330.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0034180|SK1_G0034180.mRNA.1, phase_guess: 1
incorrect_length: SK1_G0037710|SK1_G0037710.mRNA.1, phase_guess: 2
incorrect_length: SK1_G0043630|SK1_G0043630.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0052440|SK1_G0052440.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0052450|SK1_G0052450.mRNA.1, phase_guess: 2
incorrect_length: SK1_G0057700|SK1_G0057700.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0057730|SK1_G0057730.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0057750|SK1_G0057750.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0057770|SK1_G0057770.mRNA.1, phase_guess: 0
incorrect_length: SK1_G0057810|SK1_G0057810.mRNA.1, phase_guess: 0
Binary file modified Manual.pdf
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ prefix="SK1" # file name prefix for the output files
reads="./../00.Long_Reads/SK1.filtered_subreads.fastq.gz" # path of the long reads file (in fastq or fastq.gz format)
reads_type="pacbio-raw" # long reads data type: "pacbio-raw" or "pacbio-corrected" or "nanopore-raw" or "nanopore-corrected"
genome_size="12.5m" # estimated genome size with the format of <number>[g|m|k], e.g. 12.5m for 12.5 Mb
assembler="canu" # long-read assembler: "canu" (default) or "flye" or "smartdenovo". Based on our test, canu gives better results but takes longer to finish.
customized_canu_parameters="-correctedErrorRate=0.04" # users can set customized Canu assembly parameters here or simply leave it empty like "" to use Canu's default assembly parameter. For example you could set "-correctedErrorRate=0.04" for high coverage (>60X) PacBio data and "-correctedErrorRate=0.12 -overlapper=mhap -utgReAlign=true" for high coverage (>60X) Nanopore data to improve the assembly speed. More than one customized parameters can be set here as long as they are separeted by space (e.g. "-option1=XXX -option2=YYY -option3=ZZZ"). Please consult Canu's manual "http://canu.readthedocs.io/en/latest/faq.html#what-parameters-can-i-tweak" for advanced customization settings.
threads=1 # number of threads to use
vcf="yes" # use "yes" if prefer to have vcf file generated to show SNP and INDEL differences between the assembled genome and the reference genome.
dotplot="yes" # use "yes" if prefer to plot genome-wide dotplot based on the comparison with the reference genome below, otherwise use "no"
Expand Down Expand Up @@ -43,49 +45,83 @@ then
fi
fi

out_dir=${prefix}_canu_out

$canu_dir/canu -p $prefix -d $out_dir \
useGrid=false \
maxThreads=$threads \
genomeSize=$genome_size \
gnuplot=$gnuplot_dir/gnuplot \
-${reads_type} $reads

# ln -s $out_dir/$prefix.correctedReads.fasta.gz .
perl $LRSDAY_HOME/scripts/simplify_seq_name.pl -i ./$out_dir/$prefix.contigs.fasta -o $prefix.canu.fa
out_dir=${prefix}_${assembler}_out

if [[ "$assembler" == "canu" ]]
then
$canu_dir/canu -p $prefix -d $out_dir \
useGrid=false \
maxThreads=$threads \
genomeSize=$genome_size \
gnuplot=$gnuplot_dir/gnuplot \
-${reads_type} $reads \
$customized_canu_parameters

perl $LRSDAY_HOME/scripts/simplify_seq_name.pl -i ./$out_dir/$prefix.contigs.fasta -o $prefix.$assembler.fa
elif [[ "$assembler" == "flye" ]]
then
if [[ "$reads_type" == "pacbio-corrected" ]]
then
reads_type="pacbio-corr"
elif [[ "$reads_type" == "nanopore-raw" ]]
then
reads_type="nano-raw"
elif [[ "$reads_type" == "nanopore-corrected" ]]
then
reads_type="nano-corr"
fi
$flye_dir/flye -o $out_dir \
-t $threads \
-g $genome_size \
--${reads_type} $reads \
--min-overlap 7000 \
-i 2
perl $LRSDAY_HOME/scripts/simplify_seq_name.pl -i ./$out_dir/contigs.fasta -o $prefix.$assembler.fa
elif [[ "$assembler" == "smartdenovo" ]]
then
mkdir $out_dir
cd $out_dir
$smartdenovo_dir/smartdenovo_customized.pl -p $prefix -t $threads -c 1 ./../$reads > $prefix.mak
make -f $prefix.mak
perl $LRSDAY_HOME/scripts/simplify_seq_name.pl -i $prefix.dmo.cns -o ./../$prefix.$assembler.fa
cd ..
fi

# generate assembly statistics
perl $LRSDAY_HOME/scripts/cal_assembly_stats.pl -i $prefix.canu.fa -o $prefix.canu.stats.txt
perl $LRSDAY_HOME/scripts/cal_assembly_stats.pl -i $prefix.$assembler.fa -o $prefix.$assembler.stats.txt

# make the comparison between the assembled genome and the reference genome
$mummer_dir/nucmer -t $threads --maxmatch --nosimplify -p $prefix.canu $ref_genome_raw $prefix.canu.fa
$mummer_dir/delta-filter -m $prefix.canu.delta > $prefix.canu.delta_filter
$mummer_dir/nucmer -t $threads --maxmatch --nosimplify -p $prefix.$assembler $ref_genome_raw $prefix.$assembler.fa
$mummer_dir/delta-filter -m $prefix.$assembler.delta > $prefix.$assembler.delta_filter

# generate the vcf output
if [[ $vcf == "yes" ]]
then
$mummer_dir/show-coords -b -T -r -c -l -d $prefix.canu.delta_filter > $prefix.canu.filter.coords
$mummer_dir/show-snps -C -T -l -r $prefix.canu.delta_filter > $prefix.canu.filter.snps
perl $LRSDAY_HOME/scripts/mummer2vcf.pl -r ref_genome.fa -i $prefix.canu.filter.snps -t SNP -p $prefix.canu.filter
perl $LRSDAY_HOME/scripts/mummer2vcf.pl -r ref_genome.fa -i $prefix.canu.filter.snps -t INDEL -p $prefix.canu.filter
$mummer_dir/show-coords -b -T -r -c -l -d $prefix.$assembler.delta_filter > $prefix.$assembler.filter.coords
$mummer_dir/show-snps -C -T -l -r $prefix.$assembler.delta_filter > $prefix.$assembler.filter.snps
perl $LRSDAY_HOME/scripts/mummer2vcf.pl -r ref_genome.fa -i $prefix.$assembler.filter.snps -t SNP -p $prefix.$assembler.filter
perl $LRSDAY_HOME/scripts/mummer2vcf.pl -r ref_genome.fa -i $prefix.$assembler.filter.snps -t INDEL -p $prefix.$assembler.filter
$samtools_dir/samtools faidx ref_genome.fa
awk '{printf("##contig=<ID=%s,length=%d>\n",$1,$2);}' ref_genome.fa.fai > $prefix.vcf_header.txt
sed -i -e "/##reference/r $prefix.vcf_header.txt" $prefix.canu.filter.mummer2vcf.SNP.vcf
sed -i -e "/##reference/r $prefix.vcf_header.txt" $prefix.canu.filter.mummer2vcf.INDEL.vcf
sed -i -e "/##reference/r $prefix.vcf_header.txt" $prefix.$assembler.filter.mummer2vcf.SNP.vcf
sed -i -e "/##reference/r $prefix.vcf_header.txt" $prefix.$assembler.filter.mummer2vcf.INDEL.vcf
fi

# generate genome-wide dotplot
if [[ $dotplot == "yes" ]]
then
$mummer_dir/mummerplot --large --postscript $prefix.canu.delta_filter -p $prefix.canu.filter
perl $LRSDAY_HOME/scripts/fine_tune_gnuplot.pl -i $prefix.canu.filter.gp -o $prefix.canu.filter_adjust.gp -r ref_genome.fa -q ${prefix}.canu.fa
$gnuplot_dir/gnuplot < $prefix.canu.filter_adjust.gp
$mummer_dir/mummerplot --large --postscript $prefix.$assembler.delta_filter -p $prefix.$assembler.filter
perl $LRSDAY_HOME/scripts/fine_tune_gnuplot.pl -i $prefix.$assembler.filter.gp -o $prefix.$assembler.filter_adjust.gp -r ref_genome.fa -q ${prefix}.$assembler.fa
$gnuplot_dir/gnuplot < $prefix.$assembler.filter_adjust.gp
fi

# clean up intermediate files
if [[ $debug == "no" ]]
then
rm ref_genome.fa
rm ref_genome.fa.fai
rm *.delta
rm *.delta_filter
# rm ref_genome.fa
Expand All @@ -94,7 +130,7 @@ then
then
rm *.filter.coords
rm $prefix.vcf_header.txt
rm $prefix.canu.filter.snps
rm $prefix.$assembler.filter.snps
fi
if [[ $dotplot == "yes" ]]
then
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,14 @@ ln -s $raw_assembly refseq.fa

cp $adapter adapter.fa

mkdir tmp
if [[ $mode == "PE" ]]
then
java -jar $trimmomatic_dir/trimmomatic.jar PE -threads $threads -phred33 raw.R1.fq.gz raw.R2.fq.gz trimmed.R1.fq.gz trimmed.unpaired.R1.fq.gz trimmed.R2.fq.gz trimmed.unpaired.R2.fq.gz ILLUMINACLIP:adapter.fa:2:30:10 SLIDINGWINDOW:5:20 MINLEN:36
java -Djava.io.tmpdir=./tmp -XX:ParallelGCThreads=$threads -jar $trimmomatic_dir/trimmomatic.jar PE -threads $threads -phred33 raw.R1.fq.gz raw.R2.fq.gz trimmed.R1.fq.gz trimmed.unpaired.R1.fq.gz trimmed.R2.fq.gz trimmed.unpaired.R2.fq.gz ILLUMINACLIP:adapter.fa:2:30:10 SLIDINGWINDOW:5:20 MINLEN:36
rm trimmed.unpaired.R1.fq.gz
rm trimmed.unpaired.R2.fq.gz
else
java -jar $trimmomatic_dir/trimmomatic.jar SE -threads $threads -phred33 raw.fq.gz trimmed.fq.gz ILLUMINACLIP:adapter.fa:2:30:10 SLIDINGWINDOW:5:20 MINLEN:36
java -Djava.io.tmpdir=./tmp -XX:ParallelGCThreads=$threads -jar $trimmomatic_dir/trimmomatic.jar SE -threads $threads -phred33 raw.fq.gz trimmed.fq.gz ILLUMINACLIP:adapter.fa:2:30:10 SLIDINGWINDOW:5:20 MINLEN:36
fi

# bwa mapping
Expand All @@ -57,48 +58,54 @@ fi

# index reference sequence
$samtools_dir/samtools faidx refseq.fa
java -jar $picard_dir/picard.jar CreateSequenceDictionary REFERENCE=refseq.fa OUTPUT=refseq.dict
java -Djava.io.tmpdir=./tmp -XX:ParallelGCThreads=$threads -jar $picard_dir/picard.jar CreateSequenceDictionary REFERENCE=refseq.fa OUTPUT=refseq.dict

# sort bam file by picard-tools SortSam
java -jar $picard_dir/picard.jar SortSam INPUT=$prefix.sam OUTPUT=$prefix.sort.bam SORT_ORDER=coordinate
java -Djava.io.tmpdir=./tmp -XX:ParallelGCThreads=$threads -jar $picard_dir/picard.jar SortSam INPUT=$prefix.sam OUTPUT=$prefix.sort.bam SORT_ORDER=coordinate

# fixmate
java -jar $picard_dir/picard.jar FixMateInformation INPUT=$prefix.sort.bam OUTPUT=$prefix.fixmate.bam
java -Djava.io.tmpdir=./tmp -XX:ParallelGCThreads=$threads -jar $picard_dir/picard.jar FixMateInformation INPUT=$prefix.sort.bam OUTPUT=$prefix.fixmate.bam

# add or replace read groups and sort
java -jar $picard_dir/picard.jar AddOrReplaceReadGroups INPUT=$prefix.fixmate.bam OUTPUT=$prefix.rdgrp.bam \
java -Djava.io.tmpdir=./tmp -XX:ParallelGCThreads=$threads -jar $picard_dir/picard.jar AddOrReplaceReadGroups INPUT=$prefix.fixmate.bam OUTPUT=$prefix.rdgrp.bam \
SORT_ORDER=coordinate RGID=$prefix RGLB=$prefix RGPL="Illumina" RGPU=$prefix RGSM=$prefix RGCN="RGCN"

# remove duplicates
java -jar $picard_dir/picard.jar MarkDuplicates INPUT=$prefix.rdgrp.bam REMOVE_DUPLICATES=true \
java -Djava.io.tmpdir=./tmp -XX:ParallelGCThreads=$threads -jar $picard_dir/picard.jar MarkDuplicates INPUT=$prefix.rdgrp.bam REMOVE_DUPLICATES=true \
METRICS_FILE=$prefix.dedup.matrics OUTPUT=$prefix.dedup.bam

# index the dedup.bam file
$samtools_dir/samtools index $prefix.dedup.bam

# GATK local realign
# find realigner targets
java -jar $gatk_dir/GenomeAnalysisTK.jar -R refseq.fa -T RealignerTargetCreator -I $prefix.dedup.bam -o $prefix.realn.intervals
java -Djava.io.tmpdir=./tmp -XX:ParallelGCThreads=$threads -jar $gatk_dir/GenomeAnalysisTK.jar -R refseq.fa -T RealignerTargetCreator -I $prefix.dedup.bam -o $prefix.realn.intervals
# run realigner
java -jar $gatk_dir/GenomeAnalysisTK.jar -R refseq.fa -T IndelRealigner -I $prefix.dedup.bam -targetIntervals $prefix.realn.intervals -o $prefix.realn.bam
java -Djava.io.tmpdir=./tmp -XX:ParallelGCThreads=$threads -jar $gatk_dir/GenomeAnalysisTK.jar -R refseq.fa -T IndelRealigner -I $prefix.dedup.bam -targetIntervals $prefix.realn.intervals -o $prefix.realn.bam

# index final bam file
$samtools_dir/samtools index $prefix.realn.bam

# for PE sequencing
if [[ $mode == "PE" ]]
then
java -Xmx16G -jar $pilon_dir/pilon.jar --genome $raw_assembly --frags $prefix.realn.bam --fix $fixlist --vcf --changes --output $prefix.pilon >$prefix.log
java -Djava.io.tmpdir=./tmp -Xmx16G -XX:ParallelGCThreads=$threads -jar $pilon_dir/pilon.jar --genome $raw_assembly --frags $prefix.realn.bam --fix $fixlist --vcf --changes --output $prefix.pilon >$prefix.log
else
java -Xmx16G -jar $pilon_dir/pilon.jar --genome $raw_assembly --unpaired $prefix.realn.bam --fix $fixlist --vcf --changes --output $prefix.pilon >$prefix.log
java -Djava.io.tmpdir=./tmp -Xmx16G -XX:ParallelGCThreads=$threads -jar $pilon_dir/pilon.jar --genome $raw_assembly --unpaired $prefix.realn.bam --fix $fixlist --vcf --changes --output $prefix.pilon >$prefix.log
fi

perl $LRSDAY_HOME/scripts/summarize_pilon_correction.pl -i $prefix.pilon.changes

mv $prefix.pilon.fasta $prefix.pilon.fa
rm -r tmp

# clean up intermediate files
if [[ $debug == "no" ]]
then
rm adapter.fa
rm refseq.fa
rm refseq.fa.fai
rm refseq.dict
rm refseq.fa.bwt
rm refseq.fa.pac
rm refseq.fa.ann
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ input_assembly="./../02.Illumina-read-based_Assembly_Polishing/SK1.pilon.fa" # p
prefix="SK1" # file name prefix for the output files
ref_genome_raw="./../00.Ref_Genome/S288C.ASM205763v1.fa" # path of the raw reference genome
ref_genome_noncore_masked="./../00.Ref_Genome/S288C.ASM205763v1.noncore_masked.fa" # path of the specially masked reference genome where subtelomeres and chromosome-ends were hard masked. When the subtelomere/chromosome-end information is unavailable for the organism that you are interested in, you can just put the path of the raw reference genome assembly here
chrMT_tag="chrMT" # sequence name for the mitochondrial genome in the raw reference genome file
chrMT_tag="chrMT" # sequence name for the mitochondrial genome in the raw reference genome file, if there are multiple reference mitochondrial genomes that you want to check, use a single ';' to separate them. e.g. "Sc_chrMT;Sp_chrMT".
gap_size=5000 # number of Ns to insert between adjacent contigs during scaffolding; You can put "auto" here (with the quotes) if you do not want to resize the gap introduced by ragout.
threads=1 # number of threads to use
debug="no" # use "yes" if prefer to keep intermediate files, otherwise use "no".
Expand All @@ -20,13 +20,15 @@ debug="no" # use "yes" if prefer to keep intermediate files, otherwise use "no".
# process the pipeline
#####################################
# run Ragout for scaffolding based on the reference genome

sed -e '/^[^>]/s/[^ATGCatgc]/N/g' $ref_genome_noncore_masked > ref_genome_noncore_masked.fa
echo ".references = ref_genome" > ragout.recipe.txt
echo ".target = $prefix" >> ragout.recipe.txt
echo "ref_genome.fasta = $ref_genome_noncore_masked" >> ragout.recipe.txt
echo "ref_genome.fasta = ./ref_genome_noncore_masked.fa" >> ragout.recipe.txt
echo "$prefix.fasta = $input_assembly" >> ragout.recipe.txt
echo ".naming_ref = ref_genome" >> ragout.recipe.txt

python $ragout_dir/ragout.py -o ${prefix}_ragout_out --solid-scaffolds -t $threads ragout.recipe.txt
python2 $ragout_dir/ragout.py -o ${prefix}_ragout_out --solid-scaffolds -t $threads ragout.recipe.txt
cat ./${prefix}_ragout_out/${prefix}_scaffolds.fasta | sed "s/^>chr_/>/g" > ./${prefix}_ragout_out/${prefix}_scaffolds.renamed.fasta
cat ./${prefix}_ragout_out/${prefix}_scaffolds.renamed.fasta ./${prefix}_ragout_out/${prefix}_unplaced.fasta > ./${prefix}_ragout_out/${prefix}.ragout.raw.fa

Expand All @@ -45,13 +47,13 @@ perl $LRSDAY_HOME/scripts/cal_assembly_stats.pl -i $prefix.ragout.fa -o $prefix.
$mummer_dir/nucmer -t $threads --maxmatch --nosimplify -p $prefix.ragout $ref_genome_raw $prefix.ragout.fa
$mummer_dir/delta-filter -m $prefix.ragout.delta > $prefix.ragout.delta_filter
$mummer_dir/show-coords -b -T -r -c -l -d $prefix.ragout.delta_filter > $prefix.ragout.filter.coords
perl $LRSDAY_HOME/scripts/identify_contigs_for_RefChr_by_mummer.pl -i $prefix.ragout.filter.coords -chr chrMT -cov 90 -o $prefix.mt_contig.list
echo $chrMT_tag | sed -e "s/;/\n/g" > ref.chrMT.list
perl $LRSDAY_HOME/scripts/identify_contigs_for_RefChr_by_mummer.pl -i $prefix.ragout.filter.coords -query_chr_list ref.chrMT.list -cov 75 -o $prefix.mt_contig.list
$mummer_dir/mummerplot --large --postscript $prefix.ragout.delta_filter -p $prefix.ragout.filter
perl $LRSDAY_HOME/scripts/fine_tune_gnuplot.pl -i $prefix.ragout.filter.gp -o $prefix.ragout.filter_adjust.gp -r $ref_genome_raw -q ${prefix}.ragout.fa
$gnuplot_dir/gnuplot < $prefix.ragout.filter_adjust.gp

# generate dotplot for the mitochondrial genome only
echo $chrMT_tag > ref.chrMT.list
perl $LRSDAY_HOME/scripts/select_fasta_by_list.pl -i $ref_genome_raw -l ref.chrMT.list -m normal -o ref.chrMT.fa
perl $LRSDAY_HOME/scripts/select_fasta_by_list.pl -i $prefix.ragout.fa -l $prefix.mt_contig.list -m normal -o $prefix.mt_contig.fa
$mummer_dir/nucmer --maxmatch --nosimplify -p $prefix.ragout.chrMT ref.chrMT.fa $prefix.mt_contig.fa
Expand All @@ -64,6 +66,7 @@ $gnuplot_dir/gnuplot < $prefix.ragout.chrMT.filter_adjust.gp
if [[ $debug == "no" ]]
then
rm ragout.recipe.txt
rm ref_genome_noncore_masked.fa
rm *.ragout.filter.coords
rm *.filter.fplot
rm *.filter.rplot
Expand Down
Loading

0 comments on commit 819b84e

Please sign in to comment.