-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extraction of Paired End Reads #3
Comments
Dear (imagine appropriate title here) L.B. Harrison,
Thank you for your interest in SLAG. It has been a while since I looked in there, and I will have to go through the Unicycler section to see what it is doing to the accession numbers. However, unlike bwa mem or DESeq2, blast takes fasta input and would align the forward and reverse reads separately. To put both in the same blastable database, it would be necessary to concatenate the forward and reverse fasta files and run makeblastdb on the combined file, or there is an option in blast to search against two or more databases with a -db "db1 + db2" type entry in the command. Give me a bit to check if SLAG is appending contig lengths to the query names.
Charles Crane
…________________________________
From: LBHarrison ***@***.***>
Sent: Friday, November 1, 2024 11:56 AM
To: cfcrane/SLAG ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [cfcrane/SLAG] Extraction of Paired End Reads (Issue #3)
You don't often get email from ***@***.*** Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
---- External Email: Use caution with attachments, links, or sharing data ----
Hello,
I am running into an issue where the fwd & reverse read databases are being searched with the modified accession IDs. This refers to the ".1" and ".2" added to the "*forwards.txt" and "*reverses.txt" files. (shown below)
Configuration settings:
$seqfile = "/home/current_user/Desktop/SLAG/single_segment.fasta";
$unicycleroutstem = "/home/current_user/Desktop/SLAG/working";
$unicyclerworkstem = "/home/current_user/Desktop/SLAG/out";
$restartflag = 0;
$maxcycle = 5;
$extractionoption = "increment";
$extincrement = 15;
$longread = 0;
$pairedend = 1;
$querytype = "nucleotide";
$blastdir = "/home/current_user/anaconda3/envs/unicycler/bin";
$forwardblastdb = "/home/current_user/Desktop/SLAG/SEQ0006F";
$reverseblastdb = "/home/current_user/Desktop/SLAG/SEQ0006R";
$nthreads = 18;
$evalue = 1e-10;
$secevalue = 1e-20;
$runalign = 10000;
$carryforwardevalue = 1e-20;
$stem = "node19_";
$tempdbname = "node19_";
$tempdboutstem = "node_19_temp";
$assembler = "unicycler";
$unicyclerexe = "/home/current_user/anaconda3/envs/unicycler/bin/unicycler";
Accession IDs in the forward reference database
(unicycler) ***@***.***:~/Desktop/SLAG$ blastdbcmd -db SEQ0006F -entry all -outfmt "%f" | awk '/>(.+) / {print substr($1,2);}' | head
SEQ0006F_trim_pair-1
SEQ0006F_trim_pair-2
SEQ0006F_trim_pair-3
SEQ0006F_trim_pair-4
SEQ0006F_trim_pair-5
SEQ0006F_trim_pair-6
SEQ0006F_trim_pair-7
SEQ0006F_trim_pair-8
SEQ0006F_trim_pair-9
SEQ0006F_trim_pair-10
First 10 entries of the *forwards.txt file.
(unicycler) ***@***.***:~/Desktop/SLAG$ head node19_accessions0.txtforwards.txt
SEQ0006R_trim_pair-148133.1
SEQ0006R_trim_pair-95798.1
SEQ0006F_trim_pair-106826.1
SEQ0006R_trim_pair-376470.1
SEQ0006R_trim_pair-239371.1
SEQ0006F_trim_pair-381483.1
SEQ0006F_trim_pair-345197.1
SEQ0006F_trim_pair-156162.1
SEQ0006F_trim_pair-167538.1
SEQ0006R_trim_pair-328526.1
Sample of 6 Error message(s) from running SLAG.pl w/ the above configuration settings
Error: [blastdbcmd] Entry not found: SEQ0006R_trim_pair-145913.2
Error: [blastdbcmd] Skipped SEQ0006R_trim_pair-145913.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-129748.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-129748.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-66745.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-66745.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-327597.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-327597.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-226852.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-226852.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-177250.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-177250.2
For reference, when I modify the configuration file to only use one of the read databases (i.e. not paired end), the pipeline proceeds as normal.
Could this be an issue arising from the blast suite version used? (blastdbcmd = 2.16.0+). This looks like a very useful program!
Thank you
—
Reply to this email directly, view it on GitHub<#3>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEEUMSRDHGUQLPZQCIZKBS3Z6OQCLAVCNFSM6AAAAABRAQKKZSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGYZDSMZSHEYDIMY>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Dear L.B. Harrison,
SLAG uses Unicycler only for long reads. For paired-end Illumina reads, it uses spades directly. See section 2.1.3 of the Molecular Ecology Resources paper. SLAG has not been set up to use Unicycler as a wrapper for Spades. Thus your options are to do what you did to coax assemblies from Unicycler with one database, or to use the spades option and possibly have to optimize k, etc., yourself. The containerized version of SLAG contains Spades and its dependencies, so you should be able to run it in your environment. Let me know if you have problems adapting the spades configuration file. Good luck.
Charles Crane
…________________________________
From: LBHarrison ***@***.***>
Sent: Friday, November 1, 2024 11:56 AM
To: cfcrane/SLAG ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [cfcrane/SLAG] Extraction of Paired End Reads (Issue #3)
You don't often get email from ***@***.*** Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
---- External Email: Use caution with attachments, links, or sharing data ----
Hello,
I am running into an issue where the fwd & reverse read databases are being searched with the modified accession IDs. This refers to the ".1" and ".2" added to the "*forwards.txt" and "*reverses.txt" files. (shown below)
Configuration settings:
$seqfile = "/home/current_user/Desktop/SLAG/single_segment.fasta";
$unicycleroutstem = "/home/current_user/Desktop/SLAG/working";
$unicyclerworkstem = "/home/current_user/Desktop/SLAG/out";
$restartflag = 0;
$maxcycle = 5;
$extractionoption = "increment";
$extincrement = 15;
$longread = 0;
$pairedend = 1;
$querytype = "nucleotide";
$blastdir = "/home/current_user/anaconda3/envs/unicycler/bin";
$forwardblastdb = "/home/current_user/Desktop/SLAG/SEQ0006F";
$reverseblastdb = "/home/current_user/Desktop/SLAG/SEQ0006R";
$nthreads = 18;
$evalue = 1e-10;
$secevalue = 1e-20;
$runalign = 10000;
$carryforwardevalue = 1e-20;
$stem = "node19_";
$tempdbname = "node19_";
$tempdboutstem = "node_19_temp";
$assembler = "unicycler";
$unicyclerexe = "/home/current_user/anaconda3/envs/unicycler/bin/unicycler";
Accession IDs in the forward reference database
(unicycler) ***@***.***:~/Desktop/SLAG$ blastdbcmd -db SEQ0006F -entry all -outfmt "%f" | awk '/>(.+) / {print substr($1,2);}' | head
SEQ0006F_trim_pair-1
SEQ0006F_trim_pair-2
SEQ0006F_trim_pair-3
SEQ0006F_trim_pair-4
SEQ0006F_trim_pair-5
SEQ0006F_trim_pair-6
SEQ0006F_trim_pair-7
SEQ0006F_trim_pair-8
SEQ0006F_trim_pair-9
SEQ0006F_trim_pair-10
First 10 entries of the *forwards.txt file.
(unicycler) ***@***.***:~/Desktop/SLAG$ head node19_accessions0.txtforwards.txt
SEQ0006R_trim_pair-148133.1
SEQ0006R_trim_pair-95798.1
SEQ0006F_trim_pair-106826.1
SEQ0006R_trim_pair-376470.1
SEQ0006R_trim_pair-239371.1
SEQ0006F_trim_pair-381483.1
SEQ0006F_trim_pair-345197.1
SEQ0006F_trim_pair-156162.1
SEQ0006F_trim_pair-167538.1
SEQ0006R_trim_pair-328526.1
Sample of 6 Error message(s) from running SLAG.pl w/ the above configuration settings
Error: [blastdbcmd] Entry not found: SEQ0006R_trim_pair-145913.2
Error: [blastdbcmd] Skipped SEQ0006R_trim_pair-145913.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-129748.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-129748.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-66745.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-66745.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-327597.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-327597.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-226852.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-226852.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-177250.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-177250.2
For reference, when I modify the configuration file to only use one of the read databases (i.e. not paired end), the pipeline proceeds as normal.
Could this be an issue arising from the blast suite version used? (blastdbcmd = 2.16.0+). This looks like a very useful program!
Thank you
—
Reply to this email directly, view it on GitHub<#3>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEEUMSRDHGUQLPZQCIZKBS3Z6OQCLAVCNFSM6AAAAABRAQKKZSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGYZDSMZSHEYDIMY>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello,
I am running into an issue where the fwd & reverse read databases are being searched with the modified accession IDs. This refers to the ".1" and ".2" added to the "*forwards.txt" and "*reverses.txt" files. (shown below)
Configuration settings:
$seqfile = "/home/current_user/Desktop/SLAG/single_segment.fasta";
$unicycleroutstem = "/home/current_user/Desktop/SLAG/working";
$unicyclerworkstem = "/home/current_user/Desktop/SLAG/out";
$restartflag = 0;
$maxcycle = 5;
$extractionoption = "increment";
$extincrement = 15;
$longread = 0;
$pairedend = 1;
$querytype = "nucleotide";
$blastdir = "/home/current_user/anaconda3/envs/unicycler/bin";
$forwardblastdb = "/home/current_user/Desktop/SLAG/SEQ0006F";
$reverseblastdb = "/home/current_user/Desktop/SLAG/SEQ0006R";
$nthreads = 18;
$evalue = 1e-10;
$secevalue = 1e-20;
$runalign = 10000;
$carryforwardevalue = 1e-20;
$stem = "node19_";
$tempdbname = "node19_";
$tempdboutstem = "node_19_temp";
$assembler = "unicycler";
$unicyclerexe = "/home/current_user/anaconda3/envs/unicycler/bin/unicycler";
Accession IDs in the forward reference database
(unicycler) current_user@XXX:~/Desktop/SLAG$ blastdbcmd -db SEQ0006F -entry all -outfmt "%f" | awk '/>(.+) / {print substr($1,2);}' | head
SEQ0006F_trim_pair-1
SEQ0006F_trim_pair-2
SEQ0006F_trim_pair-3
SEQ0006F_trim_pair-4
SEQ0006F_trim_pair-5
SEQ0006F_trim_pair-6
SEQ0006F_trim_pair-7
SEQ0006F_trim_pair-8
SEQ0006F_trim_pair-9
SEQ0006F_trim_pair-10
First 10 entries of the *forwards.txt file.
(unicycler) current_user@XXX:~/Desktop/SLAG$ head node19_accessions0.txtforwards.txt
SEQ0006R_trim_pair-148133.1
SEQ0006R_trim_pair-95798.1
SEQ0006F_trim_pair-106826.1
SEQ0006R_trim_pair-376470.1
SEQ0006R_trim_pair-239371.1
SEQ0006F_trim_pair-381483.1
SEQ0006F_trim_pair-345197.1
SEQ0006F_trim_pair-156162.1
SEQ0006F_trim_pair-167538.1
SEQ0006R_trim_pair-328526.1
Sample of 6 Error message(s) from running SLAG.pl w/ the above configuration settings
Error: [blastdbcmd] Entry not found: SEQ0006R_trim_pair-145913.2
Error: [blastdbcmd] Skipped SEQ0006R_trim_pair-145913.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-129748.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-129748.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-66745.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-66745.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-327597.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-327597.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-226852.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-226852.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-177250.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-177250.2
For reference, when I modify the configuration file to only use one of the read databases (i.e. not paired end), the pipeline proceeds as normal.
Could this be an issue arising from the blast suite version used? (blastdbcmd = 2.16.0+). This looks like a very useful program!
Thank you
The text was updated successfully, but these errors were encountered: