Make test data conform to v1.0 API #110

buske · 2015-05-12T20:16:59Z

Summary:

Issue Fix variant errors in test data #108: resolved several HGVS errors, and updated variant data accordingly
Issue Add ref, alt, and end values for variants in test data #107: added variant referenceBases, alternateBases, and end position
Issue Synchronize Readme specs and test test json files #105: made patient objects strictly conform to spec (removed non-spec label and hgvs fields)
Removed obsolete FHS dataset

Relequestual · 2015-05-27T16:00:36Z

Hey Orion, may have found some more problems.

https://github.com/MatchmakerExchange/mme-apis/blob/hotfix/v1.0b/testing/benchmark_patients.json#L1092-L1105

No position data given. type is set to SO:0001818 which is "protein_altering_variant" not "MULTIEXON DELETION".

https://github.com/MatchmakerExchange/mme-apis/blob/hotfix/v1.0b/testing/benchmark_patients.json#L1165-L1178

No position data given.

Relequestual · 2015-05-27T16:04:32Z

Also, I think the labels should probably be the SO term given name. Where did you get your labels?

Relequestual · 2015-05-28T11:15:37Z

In addition, for a single nucleotide change, the start and stop position must be the same, not one different.

For example P0000079.

variant data currently in the proposed update is...
"variant": { "alternateBases": "A", "assembly": "GRCh37", "end": 42929131, "referenceBases": "G", "referenceName": "17", "start": 42929130 },

Putting this data in VEP, produces a consequence of frameshift_variant, however looking at the response data (if using the API), it is seeing the ref Base as TG, not just G. Changing the start to also match the end (as the position data is inclusive), results in a different consequence.

It sees the position of the variant to be at 30, not 31, as you can see in the location

http://grch37.ensembl.org/Homo_sapiens/Tools/VEP/Results?db=core;tl=hrD3JGXj3hIuGNHv-858959

If the start and end match (31), the resulting VEP consequence is correct, stop_gain, as is in the data.

http://grch37.ensembl.org/Homo_sapiens/Tools/VEP/Results?db=core;tl=1dPbkyrwAugoDgCm-858973

I will hopefully make this change today.
I'm in the process of validating the vairant type / consequences. These being correct are paramount for our genomic matching algorithm.

buske · 2015-05-28T20:09:07Z

Hi Ben,
Thank you very, very much for your careful validation and finding these issues. Yes, I removed the position data because the exact endpoints were not known, but I should not have included the variant entry at all, that was a mistake. I'll double-check each of them with the underlying data.

I took the effect labels from the nearest jannovar annotation class. I didn't find an SO term that was more specific for multi-exon deletion. Did I miss one? I agree, the SO label is more appropriate, though as long as it's human-readable, I don't mind too much what is stored there.

Relequestual · 2015-06-01T09:40:46Z

I'm running this through a script to import it to our database. It's taking more work than expected because of the required fields including an ensembl transcript, which in order to get from their API, the ref and alt allels must be correct.

I figured it can't hurt to validate the variants annotations. Our matching system uses the vairnat consequence as a key component to genomic matching.

Fraid I don't know about a specific term for multi-exon deletion. A lot of variants in Decipher are large, and the start / end isn't know. We don't annotate those with a specific consequence, just mark the loss or gain.

Relequestual · 2015-06-01T10:28:44Z

After updating the position values, two variants have alter consequences.

P0001058
CTTTTC/C CTTTTC/C
annotated as SPLICING, is now FRAMESHIFT

P0001027
G/T G/T
annotated as INTRONIC, is now 3'UTR

I'll update the file accordingly.

buske · 2015-06-01T22:07:29Z

Hey Ben,

Had a minute to take a closer look. Sorry for not having a chance sooner. A few comments:

For single-base mutations, the start and stop should not be the same. As per our spec, start is 0-indexed inclusive, stop is exclusive (as in BED). So, the number of bases changed is equal to stop - start. Make sure that, if your system is used to supporting VCF-style bases, you increment the start by 1 when you perform uploading.
For P0001058, I believe the annotation was correct originally. Looking at the region in the genome browser shows it falling in an intron, at least for me.
For P0001027, it is both intronic and in the 3'UTR, so I'm fine with changing it, but I don't think it was, strictly speaking, incorrect.

Relequestual · 2015-06-02T10:17:50Z

Hey Orion. I completly missed that in our spec. I'll revert the commit and make a note to update the Decipher code to take that into account!

Regarding the consequence changes, I wouldn't know how to look at this how you might. I put the data into Ensembl VEP, and that is the data that comes out.

… (comment)" This reverts commit 95a7cec. Reason: #110 (comment)

Relequestual · 2015-06-22T14:21:01Z

Hey Orion. I think I've figured out how we have arrived at different consequences.

We don't specify the transcript used, so I took the most severe as defined by Ensembl VEP.
For example

P0001058
CTTTTC/C CTTTTC/C

You took the canonical transcript http://grch37.ensembl.org/Homo_sapiens/Transcript/Summary?g=ENSG00000167468;r=19:1103936-1106778;t=ENST00000354171

I took the transcript with the most severe consequence
http://grch37.ensembl.org/Homo_sapiens/Transcript/Summary?g=ENSG00000167468;r=19:1104067-1106442;t=ENST00000593032

From my understanding, framshift is considered "more interesting" than a splice variant.

What do you think is best to do here?

Relequestual · 2015-07-06T09:00:20Z

@buske bump! =]

buske · 2015-08-04T02:03:43Z

Hmm, I agree that if we're specifying the effect, we should really specify the transcript as well. The splicing effect was taken from the paper. In all variant effect tools I'm aware of, frameshift variants do indeed have higher priority than splicing ones.

buske · 2015-08-18T09:44:24Z

Moved discussion of transcript-specific effects to #120

Relequestual · 2015-09-02T08:48:44Z

This issue / hotfix is becoming a bit stale. Maybe the ensembl transcript can be added as a "_enstrans" on the genomic feature, just for the sample / test data for now?

Issue #109: added zygosity Issue #105: removed non-spec fields from test data

buske · 2016-11-26T08:14:02Z

I've rebased this onto master and collapsed all the various typo/fix/revert commits. Please review. This will finally merge the test data we've been using anyway into master.

Relequestual · 2016-11-29T15:41:32Z

I've eyeballed the changes and they look OK to me.
I can't re-run it through my previous import script without quite a bit of effort, as the script is no longer valid against our schemas.
I'm happy for this to be merged.

On an aside, I know we haven't kept strictly to gitflow, and that's OK, as long as the readme states that master may not be the same as the latest release. If we DO want master to always be the latest release, then we need to adheer to gitflow, and work on a dev branch till a release is ready.

buske · 2016-11-30T06:46:25Z

@Relequestual I've changed the base branch to be release/v1.0 to follow a more git flow-like approach. We can keep making bugfixes and non-substantive changes (like fixing the test data) for the v1.0 version of the API on this branch, as per git flow. Sound good?

Relequestual · 2016-11-30T12:01:03Z

No... because 1.0 is released and tagged, that can't be changed.

What do you mean by the "base" branch?

buske · 2016-12-01T07:07:18Z

Hmm, even after we tag an API release (e.g. 1.0), we still might make bugfixes and documentation clarifications for that version of the API, since people will keep using it. In git flow, I believe this is represented by creating a release branch, which you maintain even after you tag the initial release for that version. In normal semver, those bugfix releases would get tagged as a patch (e.g. 1.0.1). In our case, the API itself isn't changing in those cases... it makes sense to stay as 1.0, but the documentation and supporting code might need to be updated over time for the v1.0 version of the API. We can tag new versions, or just maintain a release/v1.0 branch. This reasonable?

By base branch I meant the branch the PR was pulling into (it was master before).

Relequestual · 2016-12-01T13:28:16Z

That's not quite how gitflow works, but pretty close.

This time round, the release branch isn't quite right either, but it doesn't matter a great deal, and I was going to address it after we finalised the release.

Gitflow works in the following way:

Development work happens on develop.
You can use feature branches for more complex changes, but it's unlikely we will need that in this project.
When you've scoped a release, and all the functional / feature changes are considered complete, you create a release branch from develop.
The release branch must now only be comitted to with bug fixes, where bugs are found during the pre-release process. You can think of the release branch as a release candidate if you will.
Once everyone (or at least someone in our case) feels that the bugs and issues are dealt with, the release branch is merged into master and develop (with merege commits) and the new commit on master is tagged.
If new bugs are found after the release branch was merged into master, and the new tag created, then a hotfix branch is created. Finishing the hotfix triggers the same process as finishing a release branch.

I'm sure you've seen the image, but for reference: http://nvie.com/posts/a-successful-git-branching-model/

It CAN be somewhat unclear from the "documentation" of gitflow, however I frequently use SourceTree for performing gitflow actions. For example finishing a release will do the two merges and tagging the commit on master, all in one step.

Please do say if any of this leads to further questions. Having a play about with SourceTree can help getting to grips with gitflow. The gitflow commandline tools leave some elements to be desired, last time we looked.

fschiettecatte · 2016-12-05T15:02:08Z

FWIW I have adopted gitflow for all my s/w development, 5 code bases, works great.

buske · 2016-12-06T01:56:40Z

@Relequestual You're completely right, sorry, I was mistaken. It would help me to clarify the method we will take to update documentation and non-substantive changes for fixed/released APIs. I agree with git flow in principle, but find strictness can get in the way of making changes that need to be made. We released v1.0 of the API over a year ago, and are still clarifying things in the docs. When we release v1.1 of the API, I'm sure we'll be maintaining and develop the documentation over the following year. We can use hotfixes and patch releases to handle it, if you'd like. It feels like overkill to make a patch release whenever we fix a few typos, but I'm okay with it. I was proposing maintaining release branches as an alternative.

Relequestual · 2016-12-06T11:43:59Z

No worries @buske.

it can be difficult to see the benefits without experiencing the problems that can happen first hand.

Consider: Someone makes a grematical change... and may as well fix this abiguity over a fied... "it's not a functional change" they believe, so it's OK. Later, someone else joins the project, and implements the API at the latest version. Suddenly, the API is not being interoperable because the first person made a functional change based on their understanding, and not that of the group.

It may sound unlikely, but I've just seen this exact thing happen in another project I'm working on. With open source projects, occasionally a few people have more time, and suddenly do a lot of work and rush ahead of everyone else. Sticking strickly to gitflow avoids potential problems by forcing us to protect against them.

buske · 2016-12-07T08:15:38Z

Fair points. Okay, so let's figure out how we proceed.

Where we are:

22 commits to master since v1.0a, mostly KEYS and disclaimer changes
some clarifications sitting in master (including "_"-prefixing which we might want to retract)
these v1.0 test data to merge in somewhere
changes on develop that need to be put off until a 2.0 release
current work towards the backwards-compatible v1.1 API changes on release/v1.1

Where I think we should be:

we use master and develop in the way gitflow intended, where develop is only for backwards-compatible changes for the current major version, and master tracks the last stable release
all changes to KEYS and disclaimers get made to develop, rather than master
we add a develop-v2 branch for parallel development on the next major version
when we release v2, we add a stable-v1 branch for continued support of v1.X
we use semver tags for each release (v1.1.0) and patch (v1.1.1)
if we need to make a clarifying change for the current version (master), we use a hotfix branch a la gitflow

To clean up the current state of things, we'd then:

pull this PRs changes and the changes we've made on master (sans the "_"-prefixing) into one combined hotfix-v1.0b branch (I'll redo a PR)
reset master to v1.0a
pull the backwards-incompatible changes from develop into a develop-v2 branch
remove develop and rename release/v1.1 to develop

This okay with everyone?

Relequestual · 2016-12-07T13:51:28Z

I mostly agree, however suggest a few small alterations:
~~... and I need to come back and write them later today because of deadlines. =[~~

develop is where the work happens for the upcoming major release. Once the upcoming major release becomes the current release, create a develop-v[major-release-number] branch. develop then continues to be the branch the work for the upcoming major release, and backwards compatible changes happen on develop-v[previous-major-release].

Therefore, we currently need to create a develop-v1 branch for continued support of backwards compatible v1 changes. This way we don't need any stable-v[num] branches.

I feel that makes things a little easier to manage, unless I've missed some of you intensions?
Thoughts?

buske · 2016-12-14T00:52:13Z

👍 That works for me! You're right, since we really shouldn't need branches for major releases beyond the upcoming one. :)

buske added the Testing label May 12, 2015

buske added this to the Before v1.1 milestone May 12, 2015

Relequestual added a commit that referenced this pull request May 28, 2015

Fixed issue with test data start and end position as per #110 (comment)

95a7cec

Relequestual added a commit that referenced this pull request Jun 2, 2015

Revert "Fixed issue with test data start and end position as per #110…

b3a32e1

… (comment)" This reverts commit 95a7cec. Reason: #110 (comment)

Relequestual mentioned this pull request Aug 18, 2015

Capture that effects are transcript-specific #120

Open

Issue #108: Updated test data

d4afae4

Issue #109: added zygosity Issue #105: removed non-spec fields from test data

buske force-pushed the hotfix/v1.0b branch from b3a32e1 to d4afae4 Compare November 26, 2016 08:10

buske changed the base branch from master to release/v1.0 November 30, 2016 06:43

buske removed the Testing label Nov 30, 2016

buske closed this Jan 3, 2017

buske deleted the hotfix/v1.0b branch October 20, 2017 04:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make test data conform to v1.0 API #110

Make test data conform to v1.0 API #110

buske commented May 12, 2015

Relequestual commented May 27, 2015

Relequestual commented May 27, 2015

Relequestual commented May 28, 2015

buske commented May 28, 2015

Relequestual commented Jun 1, 2015

Relequestual commented Jun 1, 2015

buske commented Jun 1, 2015

Relequestual commented Jun 2, 2015

Relequestual commented Jun 22, 2015

Relequestual commented Jul 6, 2015

buske commented Aug 4, 2015

buske commented Aug 18, 2015

Relequestual commented Sep 2, 2015

buske commented Nov 26, 2016

Relequestual commented Nov 29, 2016

buske commented Nov 30, 2016

Relequestual commented Nov 30, 2016

buske commented Dec 1, 2016

Relequestual commented Dec 1, 2016 •

edited

Loading

fschiettecatte commented Dec 5, 2016

buske commented Dec 6, 2016

Relequestual commented Dec 6, 2016

buske commented Dec 7, 2016

Relequestual commented Dec 7, 2016 •

edited

Loading

buske commented Dec 14, 2016

Make test data conform to v1.0 API #110

Make test data conform to v1.0 API #110

Conversation

buske commented May 12, 2015

Relequestual commented May 27, 2015

Relequestual commented May 27, 2015

Relequestual commented May 28, 2015

buske commented May 28, 2015

Relequestual commented Jun 1, 2015

Relequestual commented Jun 1, 2015

buske commented Jun 1, 2015

Relequestual commented Jun 2, 2015

Relequestual commented Jun 22, 2015

Relequestual commented Jul 6, 2015

buske commented Aug 4, 2015

buske commented Aug 18, 2015

Relequestual commented Sep 2, 2015

buske commented Nov 26, 2016

Relequestual commented Nov 29, 2016

buske commented Nov 30, 2016

Relequestual commented Nov 30, 2016

buske commented Dec 1, 2016

Relequestual commented Dec 1, 2016 • edited Loading

fschiettecatte commented Dec 5, 2016

buske commented Dec 6, 2016

Relequestual commented Dec 6, 2016

buske commented Dec 7, 2016

Relequestual commented Dec 7, 2016 • edited Loading

buske commented Dec 14, 2016

Relequestual commented Dec 1, 2016 •

edited

Loading

Relequestual commented Dec 7, 2016 •

edited

Loading