Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using recall for somatic samples #9

Open
dgaston opened this issue Nov 26, 2015 · 7 comments
Open

Using recall for somatic samples #9

dgaston opened this issue Nov 26, 2015 · 7 comments

Comments

@dgaston
Copy link

dgaston commented Nov 26, 2015

Hi Brad,

I am assuming the recall jar uses optimized parameter settings for the variant caller based on bcbio settings. I've been looking at exploring incremental join calling on tumour only samples. I use parameter settings largely similar to bcbio but of course with some specific tweaks and threshold values for acceptable allele frequencies. Is it possible to tweak recall to also pass command-line parameters to the caller?

@chapmanb
Copy link
Member

Daniel;
Thanks for taking a look at bcbio.recall for cancer calling. You're right that this has primarily been tested with larger germline calling. The settings we use for recalling for FreeBayes are here:

https://github.com/chapmanb/bcbio.variation.recall/blob/master/src/bcbio/variation/recall/square.clj#L62

I don't know of existing workflows for large numbers of tumor-only samples to have a great suggestions about setting parameters for that case. How many samples are you looking to call together? What caller were you targeting?

The best approach to tweaking this is probably to add an additional caller target (say, freebayes-somatic) that has the specific tweaks for somatic calling instead of germline. Happy to help with this if you have more specifics about the command lines you're looking to run. Thanks again.

@dgaston
Copy link
Author

dgaston commented Nov 29, 2015

Thanks Brad. Yes tweaking by adding somatic specific callers would probably
be the best approach. I was looking at experimenting since I don't know how
well incremental joint calling has been investigated in this space. I'd
probably be looking at doing 48 samples or so in a run, although testing
with more would be interesting. Of course they are much smaller than full
exomes since these are all from targeted panels. Freebayes, vardict, and
platypus are the callers already used I believe? And I have all three in my
workflow so testing those would be good.
On Nov 28, 2015 10:14 PM, "Brad Chapman" [email protected] wrote:

Daniel;
Thanks for taking a look at bcbio.recall for cancer calling. You're right
that this has primarily been tested with larger germline calling. The
settings we use for recalling for FreeBayes are here:

https://github.com/chapmanb/bcbio.variation.recall/blob/master/src/bcbio/variation/recall/square.clj#L62

I don't know of existing workflows for large numbers of tumor-only samples
to have a great suggestions about setting parameters for that case. How
many samples are you looking to call together? What caller were you
targeting?

The best approach to tweaking this is probably to add an additional caller
target (say, freebayes-somatic) that has the specific tweaks for somatic
calling instead of germline. Happy to help with this if you have more
specifics about the command lines you're looking to run. Thanks again.


Reply to this email directly or view it on GitHub
#9 (comment)
.

@chapmanb
Copy link
Member

chapmanb commented Dec 5, 2015

I don't know of ready to run approaches to this, or validations to demonstrate how much it helps versus standard tumor/normal analysis. @brentp and @arq5x mentioned they were hoping to work on this with FreeBayes so might have some advice. FreeBayes is a good first target for this since it already handles both tumor/normal and pooled germline cases, and is sensitive and precise on germline calls.

For 48 samples, my suggestion would be to do a workflow like:

  • Call all 48 samples together in a pool at the same time.
  • Apply custom filters to extract somatic calls versus germline. Depending on your experimental setup this could be excluding everything also present in the matched normals, or in any normal in the population.

So I wouldn't try to do anything fancy like recalling, and then evaluate this versus standard tumor/normal with a caller like VarDict to see if you're getting improved resolution, especially of low frequency variants. I'd be very interested in hearing how the experiment turns out. Hope this helps and thanks for all the discussion.

@dgaston
Copy link
Author

dgaston commented Dec 7, 2015

No problem, it seems like it is a bit of an under-looked at piece, so I'm happy to do some experimentation in this area. All of this is part of the pipeline construction/validation phases prepping for clinical work. In our case we don't have matched normals, for clinical sequencing this isn't typically being done due to cost constraints coupled with working with smaller targeted panels and only reporting on a subset of discovered variants.

@chapmanb
Copy link
Member

chapmanb commented Dec 9, 2015

Daniel;
Without matched normals, this is a bit different problem since you're doing multi-sample calling but with also trying to identify lower frequency variants in each sample. FreeBayes is the a good target for doing this since it handles both: MuTect and VarDict do low frequency, but not populations. HaplotypeCaller populations but not low frequency. I'm not sure how best to set the parameters to get good sensitivity and precision in these cases. If you have any known truth sets it would be worth exploring with that an a combo of the multi-sample and cancer options:

https://github.com/chapmanb/bcbio-nextgen/blob/4c57c0666e77b442013cb658a750b40afc466ea6/bcbio/variation/freebayes.py#L92
https://github.com/chapmanb/bcbio-nextgen/blob/4c57c0666e77b442013cb658a750b40afc466ea6/bcbio/variation/freebayes.py#L134

I'd definitely have interest in hearing your results.

@chapmanb
Copy link
Member

Daniel -- it would also be worth following this FreeBayes thread: freebayes/freebayes#228 Erik and Brent are talking about more generalized approaches for handling multi-sample tumor calling.

@dgaston
Copy link
Author

dgaston commented Dec 10, 2015

Thanks for the heads up, much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants