Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent BAM output from identical inputs #72

Open
blahah opened this issue Apr 4, 2016 · 5 comments
Open

Inconsistent BAM output from identical inputs #72

blahah opened this issue Apr 4, 2016 · 5 comments

Comments

@blahah
Copy link
Contributor

blahah commented Apr 4, 2016

I noticed that on some of the test data we use for transrate, I occasionally see a run where the number of alignments in the SNAP output BAM file is much lower than what it should be.

The input dataset can be found here, and the two sets of results are linked below:

The version of SNAP used was v1.0dev.96

commands used

the commands were identical between runs

./snap-aligner index example_data/one/assembly.fa assembly -s 23 -t8 -bSpace -locationSize 4
./snap-aligner paired assembly example_data/left.fq example_data/right.fq \
  -o left.fq.right.fq.assembly.bam \
  -s 0 1000 \
  -H 300000 -h 2000 \
  -d 30 -t 8 -b -M -D 5 \
  -om 5 -omax 10

It seems as though the alignment process completed very similarly in both runs, because the logged output was identical except for the speed:

run 1 (logs)

Loading index from directory... 0s.  36562 bases, seed size 23
Aligning.
Total Reads    Aligned, MAPQ >= 10    Aligned, MAPQ < 10     Unaligned              Too Short/Too Many Ns  Extra Alignments  %Pairs    Reads/s   Time in Aligner (s)
20,000         13,874 (69.37%)        6,126 (30.63%)         0 (0.00%)              0 (0.00%)              4,357             100.00%   173,913   0
Welcome to SNAP version 1.0dev.96.

run 2 (logs)

Loading index from directory... 0s.  36562 bases, seed size 23
Aligning.
Total Reads    Aligned, MAPQ >= 10    Aligned, MAPQ < 10     Unaligned              Too Short/Too Many Ns  Extra Alignments  %Pairs    Reads/s   Time in Aligner (s)
20,000         13,874 (69.37%)        6,126 (30.63%)         0 (0.00%)              0 (0.00%)              4,357             100.00%   160,000   0
Welcome to SNAP version 1.0dev.96.

However, comparing the BAMfiles written out, we can see that run1 failed to report a large proportion of the alignments:

run1 (bamtools stats)

Total reads:       11800
Mapped reads:      11800    (100%)
Forward strand:    5900 (50%)
Reverse strand:    5900 (50%)
Failed QC:         0    (0%)
Duplicates:        0    (0%)
Paired-end reads:  11800    (100%)
'Proper-pairs':    11800    (100%)
Both pairs mapped: 11800    (100%)
Read 1:            5900
Read 2:            5900
Singletons:        0    (0%)

run2 (bamtools stats)

Total reads:       28714
Mapped reads:      28714    (100%)
Forward strand:    14357    (50%)
Reverse strand:    14357    (50%)
Failed QC:         0    (0%)
Duplicates:        0    (0%)
Paired-end reads:  28714    (100%)
'Proper-pairs':    28714    (100%)
Both pairs mapped: 28714    (100%)
Read 1:            14357
Read 2:            14357
Singletons:        0    (0%)

Both commands had an exit status of 0, so SNAP didn't throw any error.

I would be very grateful for your help in figuring out what's happening - at the moment this bug means that our software that depends on SNAP produces results that are not reproducible, so it's a priority for me.

@macmanes
Copy link

macmanes commented Apr 7, 2016

I have observed a similar issue as well. Any advice greatly appreciated.

@bolosky
Copy link
Contributor

bolosky commented Apr 11, 2016

Sorry not to reply earlier, my spam filter decided the snap mailing list was something I shouldn't see.

This seems very bad, and not something I have seen before. I wonder if it is a Linux only problem, nearly all the dev work is done on widows. It's not even writing as many reads as are in the input file, so maybe somehow the process is exciting before all of the buffers are flushed; it is a very small input file.

Anyway, I'm on vacation now, so I probably can't get to it until next week. Maybe Ravi has some time earlier.

Sent from my Windows Phone


From: Richard Smith-Unnamailto:[email protected]
Sent: ‎4/‎4/‎2016 10:12 AM
To: amplab/snapmailto:[email protected]
Subject: [amplab/snap] Inconsistent BAM output from identical inputs (#72)

I noticed that on some of the test data we use for transrate, I occasionally see a run where the number of alignments in the SNAP output BAM file is much lower than what it should be.

The input dataset can be found herehttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fbintray.com%2fartifact%2fdownload%2fblahah%2fgeneric%2fexample_data.tar.gz&data=01%7c01%7cbolosky%40microsoft.com%7c79bae9ad407449c3ce8308d35cc560e6%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=xGjmpZVyyYgmibjhyYGB5ZDBB5WAbv3H67p5TjtOHfg%3d, and the two sets of results are linked below:

  • run1: incorrect outputhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fwww.dropbox.com%2fs%2fbbguuzqc6kpl1no%2frun1.tar.gz%3fdl%3d0&data=01%7c01%7cbolosky%40microsoft.com%7c79bae9ad407449c3ce8308d35cc560e6%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=CVITdavf0OaXVY3mOqH1LSUpASF1gaY37V9fY3inAMg%3d
  • run2: correct outputhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fwww.dropbox.com%2fs%2f190k2zsifzhv8l3%2frun2.tar.gz%3fdl%3d0&data=01%7c01%7cbolosky%40microsoft.com%7c79bae9ad407449c3ce8308d35cc560e6%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=WrsUGZ6T%2fxhHOnfe1y1VXp5jqgECbhs9%2bfyHt6ZczC8%3d

The version of SNAP used was v1.0dev.96

commands used

the commands were identical between runs

./snap-aligner index example_data/one/assembly.fa assembly -s 23 -t8 -bSpace -locationSize 4
./snap-aligner paired assembly example_data/left.fq example_data/right.fq
-o left.fq.right.fq.assembly.bam
-s 0 1000
-H 300000 -h 2000
-d 30 -t 8 -b -M -D 5
-om 5 -omax 10

It seems as though the alignment process completed very similarly in both runs, because the logged output was identical except for the speed:

run 1 (logs)

Loading index from directory... 0s. 36562 bases, seed size 23
Aligning.
Total Reads Aligned, MAPQ >= 10 Aligned, MAPQ < 10 Unaligned Too Short/Too Many Ns Extra Alignments %Pairs Reads/s Time in Aligner (s)
20,000 13,874 (69.37%) 6,126 (30.63%) 0 (0.00%) 0 (0.00%) 4,357 100.00% 173,913 0
Welcome to SNAP version 1.0dev.96.

run 2 (logs)

Loading index from directory... 0s. 36562 bases, seed size 23
Aligning.
Total Reads Aligned, MAPQ >= 10 Aligned, MAPQ < 10 Unaligned Too Short/Too Many Ns Extra Alignments %Pairs Reads/s Time in Aligner (s)
20,000 13,874 (69.37%) 6,126 (30.63%) 0 (0.00%) 0 (0.00%) 4,357 100.00% 160,000 0
Welcome to SNAP version 1.0dev.96.

However, comparing the BAMfiles written out, we can see that run1 failed to report a large proportion of the alignments:

run1 (bamtools stats)

Total reads: 11800
Mapped reads: 11800 (100%)
Forward strand: 5900 (50%)
Reverse strand: 5900 (50%)
Failed QC: 0 (0%)
Duplicates: 0 (0%)
Paired-end reads: 11800 (100%)
'Proper-pairs': 11800 (100%)
Both pairs mapped: 11800 (100%)
Read 1: 5900
Read 2: 5900
Singletons: 0 (0%)

run2 (bamtools stats)

Total reads: 28714
Mapped reads: 28714 (100%)
Forward strand: 14357 (50%)
Reverse strand: 14357 (50%)
Failed QC: 0 (0%)
Duplicates: 0 (0%)
Paired-end reads: 28714 (100%)
'Proper-pairs': 28714 (100%)
Both pairs mapped: 28714 (100%)
Read 1: 14357
Read 2: 14357
Singletons: 0 (0%)

Both commands had an exit status of 0, so SNAP didn't throw any error.

I would be very grateful for your help in figuring out what's happening - at the moment this bug means that our software that depends on SNAP produces results that are not reproducible, so it's a priority for me.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHubhttps://github.com//issues/72

@bolosky
Copy link
Contributor

bolosky commented Apr 12, 2016

Even though I’m on vacation, I started looking at this, and the dataset you provide doesn’t have the same fasta as the example you give. It has transcripts.fa rather than one/assembly.fa.

My guess is that this doesn’t matter in the least, since it’s probably unrelated to the actual alignment, but in case you think it might be, please let me know.

--Bill

From: Richard Smith-Unna [mailto:[email protected]]
Sent: Monday, April 4, 2016 10:12 AM
To: amplab/snap [email protected]
Subject: [amplab/snap] Inconsistent BAM output from identical inputs (#72)

I noticed that on some of the test data we use for transrate, I occasionally see a run where the number of alignments in the SNAP output BAM file is much lower than what it should be.

The input dataset can be found herehttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fbintray.com%2fartifact%2fdownload%2fblahah%2fgeneric%2fexample_data.tar.gz&data=01%7c01%7cbolosky%40microsoft.com%7c79bae9ad407449c3ce8308d35cc560e6%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=xGjmpZVyyYgmibjhyYGB5ZDBB5WAbv3H67p5TjtOHfg%3d, and the two sets of results are linked below:

  • run1: incorrect outputhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fwww.dropbox.com%2fs%2fbbguuzqc6kpl1no%2frun1.tar.gz%3fdl%3d0&data=01%7c01%7cbolosky%40microsoft.com%7c79bae9ad407449c3ce8308d35cc560e6%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=CVITdavf0OaXVY3mOqH1LSUpASF1gaY37V9fY3inAMg%3d
  • run2: correct outputhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fwww.dropbox.com%2fs%2f190k2zsifzhv8l3%2frun2.tar.gz%3fdl%3d0&data=01%7c01%7cbolosky%40microsoft.com%7c79bae9ad407449c3ce8308d35cc560e6%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=WrsUGZ6T%2fxhHOnfe1y1VXp5jqgECbhs9%2bfyHt6ZczC8%3d

The version of SNAP used was v1.0dev.96

commands used

the commands were identical between runs

./snap-aligner index example_data/one/assembly.fa assembly -s 23 -t8 -bSpace -locationSize 4

./snap-aligner paired assembly example_data/left.fq example_data/right.fq \

-o left.fq.right.fq.assembly.bam \

-s 0 1000 \

-H 300000 -h 2000 \

-d 30 -t 8 -b -M -D 5 \

-om 5 -omax 10

It seems as though the alignment process completed very similarly in both runs, because the logged output was identical except for the speed:

run 1 (logs)

Loading index from directory... 0s. 36562 bases, seed size 23

Aligning.

Total Reads Aligned, MAPQ >= 10 Aligned, MAPQ < 10 Unaligned Too Short/Too Many Ns Extra Alignments %Pairs Reads/s Time in Aligner (s)

20,000 13,874 (69.37%) 6,126 (30.63%) 0 (0.00%) 0 (0.00%) 4,357 100.00% 173,913 0

Welcome to SNAP version 1.0dev.96.

run 2 (logs)

Loading index from directory... 0s. 36562 bases, seed size 23

Aligning.

Total Reads Aligned, MAPQ >= 10 Aligned, MAPQ < 10 Unaligned Too Short/Too Many Ns Extra Alignments %Pairs Reads/s Time in Aligner (s)

20,000 13,874 (69.37%) 6,126 (30.63%) 0 (0.00%) 0 (0.00%) 4,357 100.00% 160,000 0

Welcome to SNAP version 1.0dev.96.

However, comparing the BAMfiles written out, we can see that run1 failed to report a large proportion of the alignments:

run1 (bamtools stats)

Total reads: 11800

Mapped reads: 11800 (100%)

Forward strand: 5900 (50%)

Reverse strand: 5900 (50%)

Failed QC: 0 (0%)

Duplicates: 0 (0%)

Paired-end reads: 11800 (100%)

'Proper-pairs': 11800 (100%)

Both pairs mapped: 11800 (100%)

Read 1: 5900

Read 2: 5900

Singletons: 0 (0%)

run2 (bamtools stats)

Total reads: 28714

Mapped reads: 28714 (100%)

Forward strand: 14357 (50%)

Reverse strand: 14357 (50%)

Failed QC: 0 (0%)

Duplicates: 0 (0%)

Paired-end reads: 28714 (100%)

'Proper-pairs': 28714 (100%)

Both pairs mapped: 28714 (100%)

Read 1: 14357

Read 2: 14357

Singletons: 0 (0%)

Both commands had an exit status of 0, so SNAP didn't throw any error.

I would be very grateful for your help in figuring out what's happening - at the moment this bug means that our software that depends on SNAP produces results that are not reproducible, so it's a priority for me.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHubhttps://github.com//issues/72

@bolosky
Copy link
Contributor

bolosky commented Apr 13, 2016

I couldn’t get this to repro either in Windows or Linux.

What OS version are you using, and how often do you see the truncated output? Maybe I just haven’t tried it enough times.

--Bill

From: Richard Smith-Unna [mailto:[email protected]]
Sent: Monday, April 4, 2016 10:12 AM
To: amplab/snap [email protected]
Subject: [amplab/snap] Inconsistent BAM output from identical inputs (#72)

I noticed that on some of the test data we use for transrate, I occasionally see a run where the number of alignments in the SNAP output BAM file is much lower than what it should be.

The input dataset can be found herehttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fbintray.com%2fartifact%2fdownload%2fblahah%2fgeneric%2fexample_data.tar.gz&data=01%7c01%7cbolosky%40microsoft.com%7c79bae9ad407449c3ce8308d35cc560e6%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=xGjmpZVyyYgmibjhyYGB5ZDBB5WAbv3H67p5TjtOHfg%3d, and the two sets of results are linked below:

  • run1: incorrect outputhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fwww.dropbox.com%2fs%2fbbguuzqc6kpl1no%2frun1.tar.gz%3fdl%3d0&data=01%7c01%7cbolosky%40microsoft.com%7c79bae9ad407449c3ce8308d35cc560e6%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=CVITdavf0OaXVY3mOqH1LSUpASF1gaY37V9fY3inAMg%3d
  • run2: correct outputhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fwww.dropbox.com%2fs%2f190k2zsifzhv8l3%2frun2.tar.gz%3fdl%3d0&data=01%7c01%7cbolosky%40microsoft.com%7c79bae9ad407449c3ce8308d35cc560e6%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=WrsUGZ6T%2fxhHOnfe1y1VXp5jqgECbhs9%2bfyHt6ZczC8%3d

The version of SNAP used was v1.0dev.96

commands used

the commands were identical between runs

./snap-aligner index example_data/one/assembly.fa assembly -s 23 -t8 -bSpace -locationSize 4

./snap-aligner paired assembly example_data/left.fq example_data/right.fq \

-o left.fq.right.fq.assembly.bam \

-s 0 1000 \

-H 300000 -h 2000 \

-d 30 -t 8 -b -M -D 5 \

-om 5 -omax 10

It seems as though the alignment process completed very similarly in both runs, because the logged output was identical except for the speed:

run 1 (logs)

Loading index from directory... 0s. 36562 bases, seed size 23

Aligning.

Total Reads Aligned, MAPQ >= 10 Aligned, MAPQ < 10 Unaligned Too Short/Too Many Ns Extra Alignments %Pairs Reads/s Time in Aligner (s)

20,000 13,874 (69.37%) 6,126 (30.63%) 0 (0.00%) 0 (0.00%) 4,357 100.00% 173,913 0

Welcome to SNAP version 1.0dev.96.

run 2 (logs)

Loading index from directory... 0s. 36562 bases, seed size 23

Aligning.

Total Reads Aligned, MAPQ >= 10 Aligned, MAPQ < 10 Unaligned Too Short/Too Many Ns Extra Alignments %Pairs Reads/s Time in Aligner (s)

20,000 13,874 (69.37%) 6,126 (30.63%) 0 (0.00%) 0 (0.00%) 4,357 100.00% 160,000 0

Welcome to SNAP version 1.0dev.96.

However, comparing the BAMfiles written out, we can see that run1 failed to report a large proportion of the alignments:

run1 (bamtools stats)

Total reads: 11800

Mapped reads: 11800 (100%)

Forward strand: 5900 (50%)

Reverse strand: 5900 (50%)

Failed QC: 0 (0%)

Duplicates: 0 (0%)

Paired-end reads: 11800 (100%)

'Proper-pairs': 11800 (100%)

Both pairs mapped: 11800 (100%)

Read 1: 5900

Read 2: 5900

Singletons: 0 (0%)

run2 (bamtools stats)

Total reads: 28714

Mapped reads: 28714 (100%)

Forward strand: 14357 (50%)

Reverse strand: 14357 (50%)

Failed QC: 0 (0%)

Duplicates: 0 (0%)

Paired-end reads: 28714 (100%)

'Proper-pairs': 28714 (100%)

Both pairs mapped: 28714 (100%)

Read 1: 14357

Read 2: 14357

Singletons: 0 (0%)

Both commands had an exit status of 0, so SNAP didn't throw any error.

I would be very grateful for your help in figuring out what's happening - at the moment this bug means that our software that depends on SNAP produces results that are not reproducible, so it's a priority for me.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHubhttps://github.com//issues/72

@blahah
Copy link
Contributor Author

blahah commented Jun 6, 2016

hi @bolosky sorry that I missed the notifications on this, and thanks very much for taking time out during your holiday to look at it!

I've run some tests and I think the problem is specific to OSX - I can't reproduce it on Linux.

The following script helps see the issue. It requires bamtools v1.0 or greater

#!/bin/bash
for i in `seq 1 20`;
do
  rm -rf assembly *bam
  snap-aligner index example_data/transcripts.fa assembly \
    -s 23 -t8 -bSpace -locationSize 4
  snap-aligner paired assembly example_data/left.fq example_data/right.fq \
    -o left.fq.right.fq.assembly.bam \
    -s 0 1000 -H 300000 -h 2000 -d 30 \
    -t 8 -b -M -D 5 -om 5 -omax 10
  bamtools stats -in left.fq.right.fq.assembly.bam
done

saving it as test_snap.sh and running it as follows in the parent directory of example_data extracted from the link above...

./test_snap.sh 2>/dev/null | tee test_snap.log | grep 'Total reads' | sort | uniq -c

on OSX I get:

   1 Total reads:       11788
  17 Total reads:       28714
   1 Total reads:       2954
   1 Total reads:       5840

whereas on Linux I get:

     20 Total reads:       28714

Because most of our users are on Linux this is not as urgent as previously thought, but it would be really nice to figure it out. I'm guessing the none of the SNAP team are on OSX as there haven't been OSX builds for the recent release. Any ideas how I can debug this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants