You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @gp201, that is a great little test case. It appears that usher's Newick parsing does not handle quoting. So it appears to usher that there are two distinct sequences: 'hRSV/A/Germany/22-02516/2021' in tree.nwk (treating the quotes as part of the name) with no substitutions relative to the reference, and a different sequence hRSV/A/Germany/22-02516/2021 in aligned.vcf (no quotes).
It would be better for usher's Newick parsing to recognize quoting, but in the meantime there is a straightforward workaround: strip all quote characters from your input Newick. (Also make sure before creating Newick and VCF that none of the input sequence names contain any characters with special meaning in Newick like [():,;].) Here is an example command that would do that:
Description
When the nodes have certain special characters a duplicate node is created.
Steps to Reproduce
1.
usher -t tree.nwk -v aligned.vcf -o tree.pb
Observed in final_tree.nh2.
matUtils extract -i tree.pb -C lineagePaths.txt -j auspice_tree.json -S samplePaths.txt
Observed in auspice_tree.jsonExpected Behavior
The phylogenetic tree should not contain duplicate nodes.
Actual Behavior
The node 'hRSV/A/Germany/22-02516/2021' is present twice in the tree.
Additional Information
Files to reproduce the bug bug_example.zip. Run
run.sh
to generate relevant files.Environment
Conda
Usher: 0.6.3
Please let me know whether this is a genuine error or an oversight on my end. Thank you.
The text was updated successfully, but these errors were encountered: