Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extraction of Paired End Reads #3

Open
LBHarrison opened this issue Nov 1, 2024 · 2 comments
Open

Extraction of Paired End Reads #3

LBHarrison opened this issue Nov 1, 2024 · 2 comments

Comments

@LBHarrison
Copy link

Hello,

I am running into an issue where the fwd & reverse read databases are being searched with the modified accession IDs. This refers to the ".1" and ".2" added to the "*forwards.txt" and "*reverses.txt" files. (shown below)

Configuration settings:

$seqfile = "/home/current_user/Desktop/SLAG/single_segment.fasta";
$unicycleroutstem = "/home/current_user/Desktop/SLAG/working";
$unicyclerworkstem = "/home/current_user/Desktop/SLAG/out";
$restartflag = 0;
$maxcycle = 5;
$extractionoption = "increment";
$extincrement = 15;
$longread = 0;
$pairedend = 1;
$querytype = "nucleotide";
$blastdir = "/home/current_user/anaconda3/envs/unicycler/bin";
$forwardblastdb = "/home/current_user/Desktop/SLAG/SEQ0006F";
$reverseblastdb = "/home/current_user/Desktop/SLAG/SEQ0006R";
$nthreads = 18;
$evalue = 1e-10;
$secevalue = 1e-20;
$runalign = 10000;
$carryforwardevalue = 1e-20;
$stem = "node19_";
$tempdbname = "node19_";
$tempdboutstem = "node_19_temp";
$assembler = "unicycler";
$unicyclerexe = "/home/current_user/anaconda3/envs/unicycler/bin/unicycler";

Accession IDs in the forward reference database

(unicycler) current_user@XXX:~/Desktop/SLAG$ blastdbcmd -db SEQ0006F -entry all -outfmt "%f" | awk '/>(.+) / {print substr($1,2);}' | head

SEQ0006F_trim_pair-1
SEQ0006F_trim_pair-2
SEQ0006F_trim_pair-3
SEQ0006F_trim_pair-4
SEQ0006F_trim_pair-5
SEQ0006F_trim_pair-6
SEQ0006F_trim_pair-7
SEQ0006F_trim_pair-8
SEQ0006F_trim_pair-9
SEQ0006F_trim_pair-10

First 10 entries of the *forwards.txt file.

(unicycler) current_user@XXX:~/Desktop/SLAG$ head node19_accessions0.txtforwards.txt

SEQ0006R_trim_pair-148133.1
SEQ0006R_trim_pair-95798.1
SEQ0006F_trim_pair-106826.1
SEQ0006R_trim_pair-376470.1
SEQ0006R_trim_pair-239371.1
SEQ0006F_trim_pair-381483.1
SEQ0006F_trim_pair-345197.1
SEQ0006F_trim_pair-156162.1
SEQ0006F_trim_pair-167538.1
SEQ0006R_trim_pair-328526.1

Sample of 6 Error message(s) from running SLAG.pl w/ the above configuration settings

Error: [blastdbcmd] Entry not found: SEQ0006R_trim_pair-145913.2
Error: [blastdbcmd] Skipped SEQ0006R_trim_pair-145913.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-129748.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-129748.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-66745.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-66745.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-327597.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-327597.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-226852.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-226852.2
Error: [blastdbcmd] Entry not found: SEQ0006F_trim_pair-177250.2
Error: [blastdbcmd] Skipped SEQ0006F_trim_pair-177250.2

For reference, when I modify the configuration file to only use one of the read databases (i.e. not paired end), the pipeline proceeds as normal.

Could this be an issue arising from the blast suite version used? (blastdbcmd = 2.16.0+). This looks like a very useful program!

Thank you

@cfcrane
Copy link
Owner

cfcrane commented Nov 4, 2024 via email

@cfcrane
Copy link
Owner

cfcrane commented Nov 5, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants