Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronize Readme specs and test test json files #105

Open
jnguyenx opened this issue Apr 30, 2015 · 9 comments
Open

Synchronize Readme specs and test test json files #105

jnguyenx opened this issue Apr 30, 2015 · 9 comments

Comments

@jnguyenx
Copy link

I noticed two small inconsistencies:

  1. In this test file there's a field called hgvs which is not defined in the specs. In my implementation it'll be ignored.
  2. In both test files, the Feature has a label field whereas in the specs this field is not present but has ageOfOnset instead.
  3. The patient wrapper is not present in the test files either.
@buske
Copy link
Member

buske commented Apr 30, 2015

@jnguyenx Thanks for you comments! For 1. and 2., it's true, hgvs and label are not in the specification but it was intentional. We've been operating under the policy that occasionally, extra data may be useful to include in the API for interpretation or debugging, and sites that don't support it should ignore these fields (this is also how we get backwards-compatibility between minor versions). We should specify this more formally, and perhaps these should be removed from the test dataset anyway?

Regarding the patient wrapper, I'm unsure of whether this is helpful or not. The files serve two uses:

  • to provide example data that people should upload into their matchmaker
  • to provide example data for the query to make internally or externally to verify matchmaking
    Since only the later involves a patient wrapper, I left it off, but I'd be happy to people think this would be more consistent.

@buske buske added the Testing label Apr 30, 2015
@buske buske added this to the Before v1.1 milestone Apr 30, 2015
@jnguyenx
Copy link
Author

Hi Orion, thanks for the quick answer.

Well I wanted to use these json files to test against my implementation, what I did is to use the Readme specs to do all my implementation, and then added unit tests which consume those files. I ran into a couple of errors due to the 3 points I mentioned above.

So in my opinion it would be very nice just to be able to consume those files as they are provided.

I think that it is fine to leave extra fields which are not going to be parsed, but I'm a bit more worried about 2., because ageOfOnset is a required field, therefore the test files are not compliant. Same for patient.

@buske
Copy link
Member

buske commented Apr 30, 2015

Hi Jeremy,

Regarding patient: that's a fair point. I'm happy to update the test files if someone else +1s this.

Regarding ageOfOnset, it is an optional field. The only mandatory field for a phenotypic feature is id. This is described in the spec, but perhaps either:

  1. It should be clarified, and/or
  2. The summary specification at the top should only include mandatory fields?

Thoughts?

@jnguyenx
Copy link
Author

Hi,

Oh yeah my bad, I didn't see the optional.

I think that ID in Feature should have the (mandatory) label as done for GenomicFeatures. That way it'll clearer.

The summary specification is fine like that for me, in combination with the detailed specs.

Thanks for your help!

buske added a commit that referenced this issue Apr 30, 2015
@cmungall
Copy link
Member

+1 to making the spec and test data consistent

@buske
Copy link
Member

buske commented May 1, 2015

Since each request includes only a single patient, there's no easy way to make the 50 patient dataset spec-compliant, per se. I could make the test data a list of requests, (so a list of objects, each with a "patient" field and the corresponding details), but this is confusing as well, since it still isn't a valid request altogether. I think it makes the most sense to leave the dataset the way it is (perhaps removing informal hgvs and label fields), but provide an additional file with a sample request that completely conforms to spec, including the "patient" wrapper. Would this be a good compromise?

@jnguyenx
Copy link
Author

jnguyenx commented May 1, 2015

Sounds good! And also rename the files so their content is implicit, like set of patients, set of queries, etc...

Thanks!

@Relequestual
Copy link
Member

I don't think the test data file wasn't designed to be used for forming the test queries. Having said that, there's no reason we couldn't provide both an import file and individual files for individual requests. Going down that line of thought, once you split them into individual files, there's I can't see why we would mainatin both. For a one time data import script, I see little difference between opening one file vs 50. Open to discussion though.

I'm running acorss issues converting some of the hgvs codes to ref and alt alleles. I've ran into one variant specifically which is causing problems (NM_005559.3(LAMA1):c.2988_2989delA) as this doesn't appear to be a valid notation. I'm discussing this with Orion and François via email currently. Will report back here with any updates.

@buske
Copy link
Member

buske commented May 5, 2015

Update: we tracked it down to a typo in the original manuscript and committed a fix. I'm going through and verifying all the other variants now.

buske added a commit that referenced this issue Nov 26, 2016
buske added a commit that referenced this issue Nov 26, 2016
Issue #109: added zygosity

Issue #105: removed non-spec fields from test data
buske added a commit that referenced this issue Sep 25, 2017
Issue #109: added zygosity

Issue #105: removed non-spec fields from test data
buske added a commit that referenced this issue Sep 25, 2017
Issue #109: added zygosity

Issue #105: removed non-spec fields from test data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants