-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make test data conform to v1.0 API #110
Conversation
Hey Orion, may have found some more problems. No position data given. type is set to SO:0001818 which is "protein_altering_variant" not "MULTIEXON DELETION". No position data given. |
Also, I think the labels should probably be the SO term given name. Where did you get your labels? |
In addition, for a single nucleotide change, the start and stop position must be the same, not one different. For example P0000079. variant data currently in the proposed update is... Putting this data in VEP, produces a consequence of frameshift_variant, however looking at the response data (if using the API), it is seeing the ref Base as TG, not just G. Changing the start to also match the end (as the position data is inclusive), results in a different consequence. It sees the position of the variant to be at 30, not 31, as you can see in the location http://grch37.ensembl.org/Homo_sapiens/Tools/VEP/Results?db=core;tl=hrD3JGXj3hIuGNHv-858959 If the start and end match (31), the resulting VEP consequence is correct, stop_gain, as is in the data. http://grch37.ensembl.org/Homo_sapiens/Tools/VEP/Results?db=core;tl=1dPbkyrwAugoDgCm-858973 I will hopefully make this change today. |
Hi Ben, I took the effect labels from the nearest jannovar annotation class. I didn't find an SO term that was more specific for multi-exon deletion. Did I miss one? I agree, the SO label is more appropriate, though as long as it's human-readable, I don't mind too much what is stored there. |
I'm running this through a script to import it to our database. It's taking more work than expected because of the required fields including an ensembl transcript, which in order to get from their API, the ref and alt allels must be correct. I figured it can't hurt to validate the variants annotations. Our matching system uses the vairnat consequence as a key component to genomic matching. Fraid I don't know about a specific term for multi-exon deletion. A lot of variants in Decipher are large, and the start / end isn't know. We don't annotate those with a specific consequence, just mark the loss or gain. |
After updating the position values, two variants have alter consequences. P0001058 P0001027 I'll update the file accordingly. |
Hey Ben, Had a minute to take a closer look. Sorry for not having a chance sooner. A few comments:
|
Hey Orion. I completly missed that in our spec. I'll revert the commit and make a note to update the Decipher code to take that into account! Regarding the consequence changes, I wouldn't know how to look at this how you might. I put the data into Ensembl VEP, and that is the data that comes out. |
… (comment)" This reverts commit 95a7cec. Reason: #110 (comment)
Hey Orion. I think I've figured out how we have arrived at different consequences. We don't specify the transcript used, so I took the most severe as defined by Ensembl VEP. P0001058 You took the canonical transcript http://grch37.ensembl.org/Homo_sapiens/Transcript/Summary?g=ENSG00000167468;r=19:1103936-1106778;t=ENST00000354171 I took the transcript with the most severe consequence From my understanding, framshift is considered "more interesting" than a splice variant. What do you think is best to do here? |
@buske bump! =] |
Hmm, I agree that if we're specifying the effect, we should really specify the transcript as well. The splicing effect was taken from the paper. In all variant effect tools I'm aware of, frameshift variants do indeed have higher priority than splicing ones. |
Moved discussion of transcript-specific effects to #120 |
This issue / hotfix is becoming a bit stale. Maybe the ensembl transcript can be added as a "_enstrans" on the genomic feature, just for the sample / test data for now? |
I've rebased this onto master and collapsed all the various typo/fix/revert commits. Please review. This will finally merge the test data we've been using anyway into master. |
I've eyeballed the changes and they look OK to me. On an aside, I know we haven't kept strictly to gitflow, and that's OK, as long as the readme states that master may not be the same as the latest release. If we DO want master to always be the latest release, then we need to adheer to gitflow, and work on a dev branch till a release is ready. |
@Relequestual I've changed the base branch to be |
No... because 1.0 is released and tagged, that can't be changed. What do you mean by the "base" branch? |
Hmm, even after we tag an API release (e.g. 1.0), we still might make bugfixes and documentation clarifications for that version of the API, since people will keep using it. In git flow, I believe this is represented by creating a release branch, which you maintain even after you tag the initial release for that version. In normal semver, those bugfix releases would get tagged as a patch (e.g. 1.0.1). In our case, the API itself isn't changing in those cases... it makes sense to stay as 1.0, but the documentation and supporting code might need to be updated over time for the v1.0 version of the API. We can tag new versions, or just maintain a By base branch I meant the branch the PR was pulling into (it was |
That's not quite how gitflow works, but pretty close. This time round, the release branch isn't quite right either, but it doesn't matter a great deal, and I was going to address it after we finalised the release. Gitflow works in the following way:
I'm sure you've seen the image, but for reference: http://nvie.com/posts/a-successful-git-branching-model/ It CAN be somewhat unclear from the "documentation" of gitflow, however I frequently use SourceTree for performing gitflow actions. For example finishing a release will do the two merges and tagging the commit on master, all in one step. Please do say if any of this leads to further questions. Having a play about with SourceTree can help getting to grips with gitflow. The gitflow commandline tools leave some elements to be desired, last time we looked. |
FWIW I have adopted gitflow for all my s/w development, 5 code bases, works great. |
@Relequestual You're completely right, sorry, I was mistaken. It would help me to clarify the method we will take to update documentation and non-substantive changes for fixed/released APIs. I agree with git flow in principle, but find strictness can get in the way of making changes that need to be made. We released v1.0 of the API over a year ago, and are still clarifying things in the docs. When we release v1.1 of the API, I'm sure we'll be maintaining and develop the documentation over the following year. We can use hotfixes and patch releases to handle it, if you'd like. It feels like overkill to make a patch release whenever we fix a few typos, but I'm okay with it. I was proposing maintaining release branches as an alternative. |
No worries @buske. it can be difficult to see the benefits without experiencing the problems that can happen first hand. Consider: Someone makes a grematical change... and may as well fix this abiguity over a fied... "it's not a functional change" they believe, so it's OK. Later, someone else joins the project, and implements the API at the latest version. Suddenly, the API is not being interoperable because the first person made a functional change based on their understanding, and not that of the group. It may sound unlikely, but I've just seen this exact thing happen in another project I'm working on. With open source projects, occasionally a few people have more time, and suddenly do a lot of work and rush ahead of everyone else. Sticking strickly to gitflow avoids potential problems by forcing us to protect against them. |
Fair points. Okay, so let's figure out how we proceed. Where we are:
Where I think we should be:
To clean up the current state of things, we'd then:
This okay with everyone? |
I mostly agree, however suggest a few small alterations:
Therefore, we currently need to create a I feel that makes things a little easier to manage, unless I've missed some of you intensions? |
👍 That works for me! You're right, since we really shouldn't need branches for major releases beyond the upcoming one. :) |
Summary:
label
andhgvs
fields)