Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post-proces SURVIVOR output with vcf2bedpe.R #82

Open
arnikz opened this issue Dec 1, 2020 · 6 comments
Open

Post-proces SURVIVOR output with vcf2bedpe.R #82

arnikz opened this issue Dec 1, 2020 · 6 comments
Assignees
Labels

Comments

@arnikz
Copy link
Contributor

arnikz commented Dec 1, 2020

In this workflow, vcf2bedpe.R is only used to convert SV callers' output. According to GooglingTheCancerGenome/sv-gen#43, we also need to post-process SURVIVOR simSV output but this script seems missing in the workflow.

@lsantuari
Copy link
Contributor

lsantuari commented Dec 1, 2020

I will update the workflow (related to #68).

@arnikz
Copy link
Contributor Author

arnikz commented Dec 1, 2020

StructuralVariantAnnotation (SVA) package supports the following VCF notations:

  • Non-symbolic allele
  • Symbolic allele with SVTYPE of DEL, INS, and DUP.
  • Breakpoint notation SVTYPE=BND
  • Single breakend notation

Currently, we use the two notations (BND for TRA/CTX?) in the workflow.

@arnikz
Copy link
Contributor Author

arnikz commented Dec 1, 2020

SVA supports non-compliant VCFs:

  • Pindel (SVTYPE=RPL)
  • Manta (INv3, INV5 fields)
  • Delly (SVTYPE=TRA, CHR2, CT fields)
  • TIGRA (SVTYPE=CTX)

Is it correct to say that vcf2bedpe.R script (and bedpe2vcf.py) is needed to handle also LUMPY and SURVIVOR output?

@lsantuari
Copy link
Contributor

StructuralVariantAnnotation (SVA) package supports the following VCF notations:

  • Non-symbolic allele
  • Symbolic allele with SVTYPE of DEL, INS, and DUP.
  • Breakpoint notation SVTYPE=BND
  • Single breakend notation

Currently, we use the two notations (BND for TRA/CTX?) in the workflow.

Yes. I miss the inversions (INV) in the SVA documentation.

Based on my tests SVA supports also INV (see page 3 of VCF specs).
INV can also be represented in BND notation (example to be added here based on this).

@arnikz
Copy link
Contributor Author

arnikz commented Dec 1, 2020

Yes. I miss the inversions (INV) in the SVA documentation.

True. Nevertheless, INVs are handled, for example, in vcf2bedpe.R, label_classes.py and/or notebook [8].

@arnikz
Copy link
Contributor Author

arnikz commented Dec 1, 2020

The script is used to process Manta, DELLY, LUMPY, GRIDSS and SURVIVOR VCF files. However, the following lines do not make it clear for the latter (e.g., TRA vs. CTX).

arg_parser("Convert VCF output of Manta, DELLY, LUMPY or GRIDSS to BEDPE format.")

# if the file was generated with SURVIVOR simSV, treat translocations as DELLY TRA entries

# TRA
idx <- which(info(sv_callset_vcf)$SVTYPE == 'TRA')

Moreover, all issues reported here should be handled by this script. Correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants