Possible workload story: machine learning for variant quality prediction #713

hammer · 2021-10-02T21:45:49Z

hammer
Oct 2, 2021
Maintainer

@alimanfoo has mentioned that his group has recently worked on using machine learning to predict variant quality from variant call and other variant-associated metadata, usually in the context of WES or WGS data.

In a past life I worked on germline and somatic mutation calling and this sort of thing was quite common in that domain, popularized by GATK's VQSR.

I think this workload is worth considering for us. For example, https://github.com/avallonking/ForestQC/blob/master/ForestQC/data_preprocessing.py has some example code for preparing variant metadata for model fitting that we could use to represent this story.

hammer · 2021-10-02T21:46:01Z

hammer
Oct 2, 2021
Maintainer Author

(Posted by @alimanfoo)

Just to say +1, it would be good to target this use case. We recently fitted a decision tree model which seemed to work well, but will probably try a random forest classifier next time as they have a good reputation for performing well in variety of situations.

There's some parts of the analysis which will be particular to the specific ML model being fitted, but it would be nice to create an illustrative story to pull out the key features of this type of analysis that are somewhat general across different model types.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible workload story: machine learning for variant quality prediction #713

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Possible workload story: machine learning for variant quality prediction #713

hammer Oct 2, 2021 Maintainer

Replies: 1 comment

hammer Oct 2, 2021 Maintainer Author

hammer
Oct 2, 2021
Maintainer

hammer
Oct 2, 2021
Maintainer Author