Possible workload story: machine learning for variant quality prediction #713
hammer
started this conversation in
Discourse import
Replies: 1 comment
-
(Posted by @alimanfoo) Just to say +1, it would be good to target this use case. We recently fitted a decision tree model which seemed to work well, but will probably try a random forest classifier next time as they have a good reputation for performing well in variety of situations. There's some parts of the analysis which will be particular to the specific ML model being fitted, but it would be nice to create an illustrative story to pull out the key features of this type of analysis that are somewhat general across different model types. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
@alimanfoo has mentioned that his group has recently worked on using machine learning to predict variant quality from variant call and other variant-associated metadata, usually in the context of WES or WGS data.
In a past life I worked on germline and somatic mutation calling and this sort of thing was quite common in that domain, popularized by GATK's VQSR.
I think this workload is worth considering for us. For example, https://github.com/avallonking/ForestQC/blob/master/ForestQC/data_preprocessing.py has some example code for preparing variant metadata for model fitting that we could use to represent this story.
Beta Was this translation helpful? Give feedback.
All reactions