Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smoothxg throws what(): basic_string::_M_construct null not valid #16

Open
cgroza opened this issue Oct 1, 2020 · 11 comments
Open

smoothxg throws what(): basic_string::_M_construct null not valid #16

cgroza opened this issue Oct 1, 2020 · 11 comments

Comments

@cgroza
Copy link

cgroza commented Oct 1, 2020

Hi,
I am running smoothxg on GFA graphs produced via vg construct > vg view.

cgroza@blg9122:~/.../sv-graph/graphs $ ~/smoothxg/bin/smoothxg -g chr1.vg.gfa
terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid
Aborted (core dumped)
cgroza@blg9122:~/.../sv-graph/graphs $

This is the error I am getting.
I validated each vg file with vg validate and there were no warnings.
I enabled debugging with -d and the output stays the same.
What could be causing this error and how could I fix it?

Thank you,
Cristian Groza

@ekg
Copy link
Collaborator

ekg commented Oct 2, 2020

I've never tested on a graph built by vg construct. But this is a good use case, because you might be able to normalize the representation of complex variation in the VCF.

It's possible that you need to have paths covering every base in the graph for smoothing to not break like this.

Please try running vg construct -a when building your graph. Does that work?

@cgroza
Copy link
Author

cgroza commented Oct 2, 2020

Hi,
Thanks for the quick reply.
I already passed the -a flag to vg construct and the paths are indeed saved in the vg file before I convert to GFA with vg view.
And yes, I am attempting to collapse alternative sequences that are built from a multi-sample merged VCF file.
Would you like me to link a test VCF file?

cgroza@beluga4:~/scratch/sv-graph $ vg paths -L -v graphs/chr1.vg | head
_alt_690f8d2fa0e64915865faf22fb22b9a320da3fd2_3
_alt_646ca4e078f643e66c6f63af905a593f2340c187_0
_alt_d9a3a44aafbdb32476a084ea5912eb2042a419a4_5
_alt_460556e6579145a11e1258b8a8ea4fad75b2eac7_10
_alt_ec650cad5a9dcf3f96ab1f7726ad150119570c29_1
_alt_8f499a29d6265f1c936e58362b92b8bf5605f744_0
_alt_1a4af475581ce7bf25bfb6d6b0079e7f4eb96d09_2
_alt_cdeaef57e2f1a0ce2887a7b68321614e862a1cd8_15
_alt_ab71928db30ae5ffdabc5e86b7f88844d4e5eed3_0
_alt_cdeaef57e2f1a0ce2887a7b68321614e862a1cd8_3
cgroza@beluga4:~/scratch/sv-graph $

So unfortunately, it does not work.
How would the path information be kept in the GFA file after conversion?

Cristian

@ekg
Copy link
Collaborator

ekg commented Oct 2, 2020 via email

@cgroza
Copy link
Author

cgroza commented Oct 2, 2020

So my GFA lines do indeed have P lines.
No, I only get 3 lines of output on stderr/stdout and they are the logic_error exception and core dump above.
I passed the -n parameter and got equal results.

Cristian

@ekg
Copy link
Collaborator

ekg commented Oct 2, 2020

What git commit are you running here? How did you build and install smoothxg?

Could you reproduce the same thing but on a very small graph that you can share?

@cgroza
Copy link
Author

cgroza commented Oct 2, 2020

Yes I will try to adapt the test case in vg/test/small/x.fa and post it here.

@cgroza
Copy link
Author

cgroza commented Oct 2, 2020

Hi,

Here is the minimal test case GFA. It has two Alu insertions right in the middle (with one basepair difference).
test.gfa.gz
It was built from these files with vg construct -a > vg view.
x.fa.gz
small.vcf.gz

I am on the master branch of the smoothxg repo. I followed the compile instructions on the README page.

@ekg
Copy link
Collaborator

ekg commented Oct 9, 2020

I've found the problem.

One of your P lines doesn't have the right number of fields. It's missing the path description.

-> % grep _alt_5d2a3c27da7879b7e1fb081ed7596ea49fe65e13_0 test.gfa
P       _alt_5d2a3c27da7879b7e1fb081ed7596ea49fe65e13_0
grep -v _alt_5d2a3c27da7879b7e1fb081ed7596ea49fe65e13_0 test.gfa >test1.gfa
smoothxg -g test1.gfa >test1.smooth.gfa

This should probably be checked somewhere in the xg process that's failing.

What is this field meant to represent? A deletion allele?

@ekg
Copy link
Collaborator

ekg commented Oct 9, 2020

I suspect it's something made by vg construct with the -a flag.

@cgroza
Copy link
Author

cgroza commented Oct 9, 2020

I think vg construct -a is adding these _alt_...._0 at every insertion site. Traversing this path would give you the reference sequence. So vg is confusing the reference path with a deletion allele at insertion sites?
Is this new behaviour in vg construct? I don't remember this behaviour in past versions.

@cgroza
Copy link
Author

cgroza commented Oct 9, 2020

I removed the offending path and this was the output of smoothxg:

smoothed.gfa.gz

It seems the graph topology was affected everywhere, even outside the two very similar Alu insertion.
Will it collapse any two similar sub-sequences of the graph?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants