Using recall for somatic samples #9

dgaston · 2015-11-26T18:34:55Z

Hi Brad,

I am assuming the recall jar uses optimized parameter settings for the variant caller based on bcbio settings. I've been looking at exploring incremental join calling on tumour only samples. I use parameter settings largely similar to bcbio but of course with some specific tweaks and threshold values for acceptable allele frequencies. Is it possible to tweak recall to also pass command-line parameters to the caller?

chapmanb · 2015-11-29T02:14:47Z

Daniel;
Thanks for taking a look at bcbio.recall for cancer calling. You're right that this has primarily been tested with larger germline calling. The settings we use for recalling for FreeBayes are here:

https://github.com/chapmanb/bcbio.variation.recall/blob/master/src/bcbio/variation/recall/square.clj#L62

I don't know of existing workflows for large numbers of tumor-only samples to have a great suggestions about setting parameters for that case. How many samples are you looking to call together? What caller were you targeting?

The best approach to tweaking this is probably to add an additional caller target (say, freebayes-somatic) that has the specific tweaks for somatic calling instead of germline. Happy to help with this if you have more specifics about the command lines you're looking to run. Thanks again.

dgaston · 2015-11-29T13:30:18Z

Thanks Brad. Yes tweaking by adding somatic specific callers would probably
be the best approach. I was looking at experimenting since I don't know how
well incremental joint calling has been investigated in this space. I'd
probably be looking at doing 48 samples or so in a run, although testing
with more would be interesting. Of course they are much smaller than full
exomes since these are all from targeted panels. Freebayes, vardict, and
platypus are the callers already used I believe? And I have all three in my
workflow so testing those would be good.
On Nov 28, 2015 10:14 PM, "Brad Chapman" [email protected] wrote:

Daniel;
Thanks for taking a look at bcbio.recall for cancer calling. You're right
that this has primarily been tested with larger germline calling. The
settings we use for recalling for FreeBayes are here:

https://github.com/chapmanb/bcbio.variation.recall/blob/master/src/bcbio/variation/recall/square.clj#L62

I don't know of existing workflows for large numbers of tumor-only samples
to have a great suggestions about setting parameters for that case. How
many samples are you looking to call together? What caller were you
targeting?

The best approach to tweaking this is probably to add an additional caller
target (say, freebayes-somatic) that has the specific tweaks for somatic
calling instead of germline. Happy to help with this if you have more
specifics about the command lines you're looking to run. Thanks again.

—
Reply to this email directly or view it on GitHub
#9 (comment)
.

chapmanb · 2015-12-05T21:35:17Z

I don't know of ready to run approaches to this, or validations to demonstrate how much it helps versus standard tumor/normal analysis. @brentp and @arq5x mentioned they were hoping to work on this with FreeBayes so might have some advice. FreeBayes is a good first target for this since it already handles both tumor/normal and pooled germline cases, and is sensitive and precise on germline calls.

For 48 samples, my suggestion would be to do a workflow like:

Call all 48 samples together in a pool at the same time.
Apply custom filters to extract somatic calls versus germline. Depending on your experimental setup this could be excluding everything also present in the matched normals, or in any normal in the population.

So I wouldn't try to do anything fancy like recalling, and then evaluate this versus standard tumor/normal with a caller like VarDict to see if you're getting improved resolution, especially of low frequency variants. I'd be very interested in hearing how the experiment turns out. Hope this helps and thanks for all the discussion.

dgaston · 2015-12-07T12:13:04Z

No problem, it seems like it is a bit of an under-looked at piece, so I'm happy to do some experimentation in this area. All of this is part of the pipeline construction/validation phases prepping for clinical work. In our case we don't have matched normals, for clinical sequencing this isn't typically being done due to cost constraints coupled with working with smaller targeted panels and only reporting on a subset of discovered variants.

chapmanb · 2015-12-09T13:31:22Z

Daniel;
Without matched normals, this is a bit different problem since you're doing multi-sample calling but with also trying to identify lower frequency variants in each sample. FreeBayes is the a good target for doing this since it handles both: MuTect and VarDict do low frequency, but not populations. HaplotypeCaller populations but not low frequency. I'm not sure how best to set the parameters to get good sensitivity and precision in these cases. If you have any known truth sets it would be worth exploring with that an a combo of the multi-sample and cancer options:

https://github.com/chapmanb/bcbio-nextgen/blob/4c57c0666e77b442013cb658a750b40afc466ea6/bcbio/variation/freebayes.py#L92
https://github.com/chapmanb/bcbio-nextgen/blob/4c57c0666e77b442013cb658a750b40afc466ea6/bcbio/variation/freebayes.py#L134

I'd definitely have interest in hearing your results.

chapmanb · 2015-12-10T01:33:23Z

Daniel -- it would also be worth following this FreeBayes thread: freebayes/freebayes#228 Erik and Brent are talking about more generalized approaches for handling multi-sample tumor calling.

dgaston · 2015-12-10T12:07:51Z

Thanks for the heads up, much appreciated.

juliandangelo mentioned this issue Apr 7, 2016

Multi-Sample Variant Calling in Tumor only samples freebayes/freebayes#269

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using recall for somatic samples #9

Using recall for somatic samples #9

dgaston commented Nov 26, 2015

chapmanb commented Nov 29, 2015

dgaston commented Nov 29, 2015

chapmanb commented Dec 5, 2015

dgaston commented Dec 7, 2015

chapmanb commented Dec 9, 2015

chapmanb commented Dec 10, 2015

dgaston commented Dec 10, 2015

Using recall for somatic samples #9

Using recall for somatic samples #9

Comments

dgaston commented Nov 26, 2015

chapmanb commented Nov 29, 2015

dgaston commented Nov 29, 2015

chapmanb commented Dec 5, 2015

dgaston commented Dec 7, 2015

chapmanb commented Dec 9, 2015

chapmanb commented Dec 10, 2015

dgaston commented Dec 10, 2015