-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warnings on errors of protein modification and allele descriptions #2085
Comments
Another thing, that may have been mentioned in another issue is that residues in histones are often indexed without counting the first methionine, both when referring to a modification and an allele. Here some fixed examples:
I am not sure where the warning makes more sense in this case, but we should flag this in the website when people are looking at modifications / alleles of histones. These are the genes that I am counting as histones in the pipeline, by the way: histones = ['SPBC1105.11c', 'SPBC1105.12', 'SPAC1834.03c', 'SPAC1834.04', 'SPAC19G12.06c', 'SPBC8D2.03c', 'SPBC8D2.04', 'SPCC622.08c', 'SPCC622.09', 'SPBC11B10.10c', 'SPBC1105.17'] Related to pombase/allele_qc#15 |
So, the histone part of this is dealt with by displaying hht3-K56R(K57R aa) Is the first part dealt with by the old coordinates in the synonys filed? Or is this referring to something else? I don't fully understand what warnings are required. |
Yes
These are the ones that cannot be auto-fixed by the pipeline. Their syntax is correct, as in they follow the pattern to represent the variant correctly, but the residues they mention do not match the position in the sequence. That's why it would be good to mark them as referring to wrong residues. To fix these, the only way would be to write to the authors or going back to the publication. |
OK I will take this over to the curation tracker and we will work through them |
Keeping open for the warning. I guess we can put the warning on the allele/genotype page. |
If we are just putting the warning on the allele page, I can implement something and then we can tweak it (since the allele pages aren't live yet). If we want it on the genotype pages we'll need to decide how/where to display the warning. Especially if the genotype is multi-locus. |
There are a few hundred modifications to fix in total from 2 lists. We will hopefully fix most of these over time (soonish). To spearhead this, we will try to extract the associated publications and post the list to pombelist with |
What is still to do? |
I haven't added the warnings yet. I'll need to change the code for generating the website to process the Alternatively we might want to add the information to the alleles in Chado, then change the website code to use that. I don't know which plan is best. |
Yep, that's what's in the
|
Actually there are only 2
The other file *_cannot_fix_other_errors.tsv is mainly disruptions. |
Hi @kimrutherford,
One thing that would be nice is to give some warnings for alleles and modifications where we know that the sequence is wrong, so that people know it when they see it in the website.
To do that, you can use the files in this directory: https://github.com/pombase/allele_qc/tree/master/results
*_cannot_fix_sequence_errors.tsv
: all these have correct descriptions or descriptions that can be auto-fixed, but the sequence positions they indicate are wrong, so they all should be flagged with a warning of wrong sequence.*_cannot_fix_other_errors.tsv
: these are the ones that do not follow the patterns, and therefore cannot be chekcked. I hopefully will chip away most of those. Then there is the CTD ones, which for now are not supported by the pipeline. These could be flagged as "Not checked", they may be correct but they do not follow our guidelines.I also have this file, in which I take some notes about alleles that I tried to fix (I went into the publication), but did not manage. This is mostly for me to not try to fix them again, but the comments sometimes say what I think they may be.
The text was updated successfully, but these errors were encountered: