Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution halted: vcf2bedpe.R #43

Closed
arnikz opened this issue Mar 27, 2020 · 16 comments
Closed

Execution halted: vcf2bedpe.R #43

arnikz opened this issue Mar 27, 2020 · 16 comments
Assignees
Milestone

Comments

@arnikz
Copy link
Contributor

arnikz commented Mar 27, 2020

https://travis-ci.org/github/GooglingTheCancerGenome/CNN/builds/667785627#L762

@lsantuari
Copy link
Contributor

lsantuari commented Mar 27, 2020

I could:

It seems to recognize the SURVIVOR simSV VCF file as a Delly VCF file, with the missing CT field.

@lsantuari lsantuari self-assigned this Mar 27, 2020
@arnikz
Copy link
Contributor Author

arnikz commented Mar 27, 2020

That's fine. For time being, a quick fix is to limit the VCF->BEDPE to GRIDSS

https://github.com/GooglingTheCancerGenome/CNN/blob/787d879145e3ad41a164ee07b2285515310daadb/run_local.sh#L16

@lsantuari
Copy link
Contributor

Converting the truth set of the artificial SVs into BEDPE is still necessary to compare the GRIDSS callset to. See here and here, for instance. So this issue is on my priority list, also for #36 and #41.

@lsantuari lsantuari self-assigned this Mar 27, 2020
@lsantuari
Copy link
Contributor

Adding an extra step to treat TRA as CTX is a quick fix at the moment:

sed 's/TRA/CTX/g' htz-sv.vcf > htz-sv_CTX.vcf

and then using htz-sv_CTX.vcf as test.vcf

@arnikz
Copy link
Contributor Author

arnikz commented Mar 27, 2020

Converting the truth set of the artificial SVs into BEDPE is still necessary to compare the GRIDSS callset to. See here and here, for instance. So this issue is on my priority list, also for #36 and #41.

Yes, I want to use test.vcf and gridss.vcf as a quick fix for run_local.sh

@arnikz
Copy link
Contributor Author

arnikz commented Mar 27, 2020

Adding an extra step to treat TRA as CTX is a quick fix at the moment:

sed 's/TRA/CTX/g' htz-sv.vcf > htz-sv_CTX.vcf

and then using htz-sv_CTX.vcf as test.vcf

Let's keep the data as is.

@lsantuari
Copy link
Contributor

lsantuari commented Mar 27, 2020

As a reference, treating SURVIVOR simSV TRA SVs as TIGRA CTX SVs seems to work.

@lsantuari
Copy link
Contributor

Adding an extra step to treat TRA as CTX is a quick fix at the moment:
sed 's/TRA/CTX/g' htz-sv.vcf > htz-sv_CTX.vcf
and then using htz-sv_CTX.vcf as test.vcf

Let's keep the data as is.

I agree.

arnikz pushed a commit that referenced this issue Mar 27, 2020
@arnikz
Copy link
Contributor Author

arnikz commented Mar 27, 2020

arnikz pushed a commit that referenced this issue Mar 27, 2020
@arnikz
Copy link
Contributor Author

arnikz commented Mar 27, 2020

It wasn't a quick fix after all 😉

https://github.com/GooglingTheCancerGenome/CNN/blob/a276f179aae28cfbe6d860e97df32afd1afbe05c/run_local.sh#L28

🙄 Where did DELLY come from? I missed your #43 (comment).

Moreover, diff iss32 (broken) and dev branch (passes) shows change: TRA->BP.

@lsantuari
Copy link
Contributor

It wasn't a quick fix after all 😉

https://github.com/GooglingTheCancerGenome/CNN/blob/a276f179aae28cfbe6d860e97df32afd1afbe05c/run_local.sh#L28

🙄 Where did DELLY come from? I missed your #43 (comment).

If you process the SURVIVOR VCF file with the artificial SVs with breakpointRanges you get the following error:

There were 16 warnings (use warnings() to see them)
Warning message:
package ‘argparser’ was built under R version 3.6.2
Error in .breakpointRanges(x, ...) : Delly variants missing CT:
Calls: breakpointRanges -> breakpointRanges -> .breakpointRanges
In addition: Warning message:
In .breakpointRanges(x, ...) :
  Found 1000 duplicate row names (duplicates renamed).
Execution halted

This is because SURVIVOR, like DELLY, uses the SVTYPE TRA for translocations, instead of BP (GRIDSS) or CTX (TIGRA).

Moreover, diff iss32 (broken) and dev branch (passes) shows change: TRA->BP.

@arnikz
Copy link
Contributor Author

arnikz commented Mar 30, 2020

OK. It's still unclear to me which of the three (TRA, BP or CTX) should be used in this script to pass and/or to resolve merge conflicts in the branches?

@arnikz
Copy link
Contributor Author

arnikz commented Mar 30, 2020

If you process the SURVIVOR VCF file with the artificial SVs with breakpointRanges you get the following error:

Yes, the very same error shown here.

@lsantuari
Copy link
Contributor

OK. It's still unclear to me which of the three (TRA, BP or CTX) should be used in this script to pass and/or to resolve merge conflicts in the branches?

CTX

vcf2bedpe.R works with test.vcf when all the occurrence of TRA are substituted with CTX:

sed 's/TRA/CTX/g' test.vcf

However, you still have to direct the output of sed to an intermediate file that is used in input to vcf2bedpe.R.

@arnikz
Copy link
Contributor Author

arnikz commented Mar 30, 2020

Last commit passes but the code does not make use of the VCF file. Please fork the iss32 instead of dev to continue.

@lsantuari
Copy link
Contributor

Done. Branch iss43 can be deleted.

@arnikz arnikz closed this as completed Mar 31, 2020
@arnikz arnikz added this to the 0.1.0 milestone Apr 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants