Skip to content

Commit

Permalink
add option for FASTA file
Browse files Browse the repository at this point in the history
  • Loading branch information
Serghei Mangul committed Jul 25, 2018
1 parent d2229d0 commit f549be4
Show file tree
Hide file tree
Showing 4 changed files with 83 additions and 5 deletions.
4 changes: 4 additions & 0 deletions NOTES
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
We have use script to convert fasta to fastq downlaoded from here
https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/fasta-to-fastq/fasta_to_fastq.pl


47 changes: 47 additions & 0 deletions fasta_to_fastq.pl
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
#Copyright (c) 2010 LUQMAN HAKIM BIN ABDUL HADI ([email protected])
#
#Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files
#(the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify,
#merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is
#furnished to do so, subject to the following conditions:

#The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

#THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
#OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
#LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
#IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

#!/usr/bin/perl
use strict;

my $file = $ARGV[0];
open FILE, $file;

my ($header, $sequence, $sequence_length, $sequence_quality);
while(<FILE>) {
chomp $_;
if ($_ =~ /^>(.+)/) {
if($header ne "") {
print "\@".$header."\n";
print $sequence."\n";
print "+"."\n";
print $sequence_quality."\n";
}
$header = $1;
$sequence = "";
$sequence_length = "";
$sequence_quality = "";
}
else {
$sequence .= $_;
$sequence_length = length($_);
for(my $i=0; $i<$sequence_length; $i++) {$sequence_quality .= "I"}
}
}
close FILE;
print "\@".$header."\n";
print $sequence."\n";
print "+"."\n";
print $sequence_quality."\n";

23 changes: 18 additions & 5 deletions needle.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ source $(dirname $0)/argparse.bash || exit 1
argparse "$@" <<EOF || exit 1
parser.add_argument('bam')
parser.add_argument('outdir')
parser.add_argument('-fasta', '--fasta', action='store_true', default=False,
help='Forse [default %(default)s]')
parser.add_argument('-f', '--force', action='store_true', default=False,
help='Forse [default %(default)s]')
EOF
Expand Down Expand Up @@ -59,14 +61,25 @@ SAMPLE=${OUTDIR}"/"${prefix}


echo "Extract unmapped reads from " $BAM
if [[ $FASTA ]]
then
perl ${DIR_CODE}/fasta_to_fastq.pl $BAM > ${SAMPLE}.cat.unmapped.fastq
else

samtools view -f 0x4 -bh $BAM | samtools bam2fq - >${SAMPLE}.unmapped.fastq
#samtools view -bh $BAM NC_007605 | samtools fastq - > ${SAMPLE}.NC_007605.fastq
#rm -fr ${SAMPLE}.NC_007605.fastq
#cat ${SAMPLE}.unmapped.fastq ${SAMPLE}.NC_007605.fastq>${SAMPLE}.cat.unmapped.fastq
#rm -fr ${SAMPLE}.unmapped.fastq
UNMAPPED=${SAMPLE}.unmapped.fastq
samtools view -bh $BAM NC_007605 | samtools fastq - > ${SAMPLE}.NC_007605.fastq
rm -fr ${SAMPLE}.NC_007605.fastq
cat ${SAMPLE}.unmapped.fastq ${SAMPLE}.NC_007605.fastq>${SAMPLE}.cat.unmapped.fastq
rm -fr ${SAMPLE}.unmapped.fastq

fi


UNMAPPED=${SAMPLE}.cat.unmapped.fastq

wc -l $UNMAPPED

exit 1

bwa mem -a ${DB}/viral.vipr/NONFLU_All.fastq $UNMAPPED | samtools view -S -b -F 4 - | samtools sort - >${SAMPLE}.virus.bam
bwa mem -a ${DB}/fungi/fungi.ncbi.february.3.2018.fasta $UNMAPPED | samtools view -S -b -F 4 - | samtools sort - >${SAMPLE}.fungi.bam
Expand Down
14 changes: 14 additions & 0 deletions toy.example.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
>1
TCGCACCGGGGATCCTAGGCTTATTATATTGCGCTTTGTATTGAAGTCTGTTCTGTGTGGACTGACCTCAGCACAAGTGTTATGGGGCAGATCATAACAATGTTTGAGGCCCTGCCTCACATTATCGATGAGGTCATCAACATTGTTATAATAGTGCTTATAATAATAACAAGCATAAAGGCTGTGTACAACTTTGCTACCTGTGGCATCATTGCATTGATCAGCTTCTGCTTCTTGGCTGGAAGGTCTTGTGGCTTGTATGGTGTCTCTGGCTCTGACATTTACAAGGGACTCTACCAGTTCCAGTCCGTAGAGTTCAACATGTCACAATTGAATTTAACAATGCCCAATGCGTGCTCAGCCAACAATTCCCACCATTACATCAGCATGGGAAAATCTGGCCTGGAACTAACCTTTACAAATGACTCCATCATTCAACACAACTTCTGCAACCTAACTGATGGGTTCAAGAAAAAAACCTTTGATCATACACTTATGAGCATAGTGTCAAGCCTGCACCTGAGCATTAGAGGAAATACCATCTACAAAGCTGTGTCCTGTGACTTCAACAATGGGATTACAATCCAGTACAACCTAACCTTCTCTGATGCACAAGGTGCCATCAATCAATGTGGAACCTTCAGAGGTAGAGTTTTAGATATGTTTAGAACAGCTTTTGGGGGGAAATACATGAGGTCTGGCTATGGTTGGAAAGACTCCAATGGGAAGACAACCTGGTGCAGTCAAACCAACTATCAATACCTAATCATACAGAACAGGACATGGGAAAATCACTGTGAGTATGCCGGTCCTTTTGGTCTCTCAAGAATTCTTTTTGCTCAGGAGAAAACAAAGTTTCTCACTAGAAGATTGGCAGGGACTTTTACCTGGACATTGTCGGATTCTTCGGGAACTGAAACCCCAGGTGGGTATTGTCTGACAAGGTGGATGCTCATAGCTGCTGATCTCAAGTGTTTCGGGAACACAGCAGTTGCCAAATGCAACATCAACCATGATGAAGAATTTTGTGACATGTTGAGGTTAATTGACTATAACAAAGCCGCTCTAAAGAAATTCAAAGAAGACGTAGAGTCTGCCCTTCACTTGTTCAAAACAACTG
>2
CCGGGGATCCTAGGCTTATTATATTGCGCTTTGTATTGAAGTCTGTTCTGTGTGGACTGACCTCAGCACAAGTGTTATGGGGCAGATCATAACAATGTTTGAGGCCCTGCCTCACATTATCGATGAGGTCATCAACATTGTTATAATAGTGCTTATAATAATAACAAGCATAAAGGCTGTGTACAACTTTGCTACCTGTGGCATCATTGCATTGATCAGCTTCTGCTTCTTGGCTGGAAGGTCTTGTGGCTTGTATGGTGTCTCTGGCTCTGACATTTACAAGGGACTCTACCAGTTCCAGTCCGTAGAGTTCAACATGTCACAATTGAATTTAACAATGCCCAATGCGTGCTCAGCCAACAATTCCCACCATTACATCAGCATGGGAAAATCTGGCCTGGAACTAACCTTTACAAATGAC
>3
TTTACAAGGGACTCTACCAGTTCCAGTCCGTAGAGTTCAACATGTCACAATTGAATTTAACAATGCCCAATGCGTGCTCAGCCAACAATTCCCACCATTACATCAGCATGGGAAAATCTGGCCTGGAACTAACCTTTACAAATGACTCCATCATTCAACACAACTTCTGCAACCTAACTGATGGGTTCAAGAAAAAAACCTTTGATCATACACTTATGAGCATAGTGTCAAGCCTGCACCTGAGCATTAGAGGAAATACCATCTACAAAGCTGTGTCCTGTGACTTCAACAATGGGATTACAATCCAGTACAACCTAACCTTCTCTGATGCACAAGGTGCCATCAATCAATGTGGAACCTTCAGAGGTAGAGTTTTAGATATGTTTAGAACAGCTTTTGGGGGGAAATACATGAGGTCTGGCTATGGTTGGAAAGACTCCAATGGGAAGACAACCTGGTGCAGTCAAACCAACTATCAATACCTAATCATACAGAACAGGACATGGGAAAATCACTGTGAGTATGCCGGTCCTTTTGGTCTCTCAAGAATTCTTTTTGCTCAGGAGAAAACAAAGTTTCTCACTAGAAGATTGGCAGGGACTTTTACCTGGACATTGTCGGATTCTTCGGGAACTGAAACC
>4
AACCGGCGCCAGTGTGCTGGGACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAAACCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACAC
>5
CATACCTAATCAAAACCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCACATACCTAATCAAATCCGTACACCCCGCTCCTCCCTTTCTGGTAATTTTACTTTTATTTTTTTCATTTTTTTATTTTTTCATTTTTTCATTTTTTCATTTTCTCATTTTCTCTTTTTTAATCACTCTAGACGGATTTCCTCTTCTGGTAATTTTCTTTTTCTCATTTTCTCATTTTCTCATTTTCTCTTTTCTAATCACTCGAAACGGATTTCCTCTTTCGGTAATTTTGCTTCTTTGTTTTTTACTTTAATTTCTTTTTCCTCTTCTCCTCTTTCGCTTTCTCCTTTCACTCTCATCCATTCTCATTCCTCAATTTATACTTTTTTTGGCAGTATTCTTGACTGTTTTTCACCATTTCCCCTTACCCGCACTTCCATCACATTTTCTTTGTCAATCCACCTTCTTGCCATGGCCAATGCTTGGTCCTGCTGGGCCTGGTCCTTGCATAGGATACACACGTTACCCGCACAATTATCACCCATTTTCAACTTCTTCCAACTTTTGCTCCTCTTGCCATGAAACTTCATTTCAAATTACTTCACTTCATCACTTCCTTAGTTACCCTGCTTTATTATAAATCACGTGCTTCCCATCTCCCGCTTCCCATCTCCCGCTTCCCATCACCTGCTTCTCTTCACCTGCTTCTCTTCACTTTCATTTTTCATCCCTTCTCATTCTACCTTTCTTTCCCTCTCTCCCTCTGAATATAAGAATTCTCCACTGTTTTTCACAATTACTACTTACCCGCACTTAAACTACACTTCTCCATGTTAATCAACCT
>6
TAACCCTAACCCTAACCCTGACCCTAACCCTAACCCTAACCCTAACCCTAACCAGTACACGCGTACACGTACAAGCACCCGTACCCCCAGTATACCTGGACACCCGTACTCAGTTATCCTTTTTATTAGTGTACCCGCCTCTTGCACGCATGCCACAGTTCTTCAGCAGAAGAACACGCACAATGCTCTTTGATAAACGTGCGGACATGAAAAAAAGGGAAAAACGCAGCTACGTGTGCTGTCGTTGGTTTCACAGCGTCAAGCCGCGTCGGTGTACCAAAGAGGAGGTGACCCATCGAGTACTCGCACCCTCTAGCTCTCCTTTTCTGCCTCGTATTATACACGTTGATCGGAAAACAGGGTAGGCACTAGCCACCGATAATCTTCAATCGTACATCTGTCTGCGTAAGCGCGTGCCCCGGATGGAGGGCATGGAACTGCATCGACCGCCCACGGCGATCGCCGATCAGCCAGCGATGTGACTGCAACGCTGTTTGTTTCCACAACGAGGGCTGAAGGCTTTCTGATAGATTGTGCGCTATAGAACAAGGAGGGAGAGCCCACCCCTTTTTATGCGAAAACTCCTCACCCAAAGCAAGGAGGGCGGCGGGTGGGAAGCGGAAAGCCAACGCCCACGCGGACGCAATTAGCACCGACCGAAAACGAGCAGTGAGAAAAAGGGAAGTCTCTCAGACTGGGAAGAGATGAGCCGAGGAGATAAATGCACCAGATCCGAGGTACCGCGGCACAAGAGGAGCCGGGTGATATTTTTTGTTGTTTTCAGTGTTTCCTCGTGAGACGGCAAAACACGAGGCAGAAAAGGTG
>7
CTTCAGCAGAAGAACACGCACAATGCTCTTTGATAAACGTGCGGACATGAAAAAAAGGGAAAAACGCAGCTACGTGTGCTGTCGTTGGTTTCACAGCGTCAAGCCGCGTCGGTGTACCAAAGAGGAGGTGACCCATCGAGTACTCGCACCCTCTAGCTCTCCTTTTCTGCCTCGTATTATACACGTTGATCGGAAAACAGGGTAGGCACTAGCCACCGATAATCTTCAATCGTACATCTGTCTGCGTAAGCGCGTGCCCCGGATGGAGGGCATGGAACTGCATCGACCGCCCACGGCGATCGCCGATCAGCCAGCGATGTGACTGCAACGCTGTTTGTTTCCACAACGAGGGCTGAAGGCTTTCTGATAGATTGTGCGCTATAGAACAAGGAGGGAGAGCCCACCCCTTTTTATGCGAAAACTCCTCACCCAAAGCAAGGAGGGCGGCGGGTGGGAAGCGGAAAGCCAACGCCCACGCGGACGCAATTAGCACCGACCGAAAACGAGCAGTGAGAAAAAGGGAAGTCTCTCAGACTGGGAAGAGATGAGCCGAGGAGATAAATGCACCAGATCCGAGGTACCGCGGCACAAGAGGAGCCGGGTGATATTTTTTGTTGTTTTCAGTGTTTCCTCGTGAGACGGCAAAACACGAGGCAGAAAAGGTGCAAGAGATCCAGGTGGCTGGCGAAGAGGAGGAACATGAGAAGAGAGACAGTCAACATTGGCGGGGAGTCGAACTTTGTGCAGCTCATGTGTGCAGGTGCAGGTCGATGGATAGAAGGCTAAGAGGCGATAGGACAGGGTCCCTTCACACCACAAGCGTGAGTGATGGAGTTATATGCGCATGGTCGAATAGGTATGCACATGTACGGCAGACAGGAAAGTAGAAGAGAGGAATTCGGAGTTGTGGAGAACGGGAAGTCGATGGGGCAGCAGCAGCAGTCAGAGCAGCAGACGAAATGCTACACGGAACGGCTTCACGGAGAGAGCATATCAGAGAAGCAGGGGAGCTGAGAAGTGCAGTCGATGTGTCACGCTTTGAAGTGTGTGACAT

0 comments on commit f549be4

Please sign in to comment.