-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Another error of quantms ms2rescore and OpenMS #458
Comments
Looks like TMT modifications are not supported (by DeepLC at least). Both the missing labelled elements from the tag, as well as the discarded peptides with that modification hint on that. Plus the error in the end. Which is probably either an implementation error or something wrong with merging or psmutils not handling merged idxmls. |
Well, DeepLC supports, in principle TMT, the other error probably @jonasscheid can help. Why merged idxmls? We run ms2rescore on top of msgf+ there. |
The warnings sound like it does not know elements with heavy isotopes so it is unlikely to predict peptides RTs with TMT mods. Maybe you are using a wrong version then, I don't know that tool too much. The other error sounded like something from merging but yes, shouldn't be merged at that point: |
You often/always have technical replicates in your samplesheet right? The implementation of DeepLC in MS2Rescore allows RT calibration only once per group of MS runs. Now, it makes most sense to do the calibration on the group of technical replicates imo and thats why you can plug in a merged idxml (post-idmerger) into this ms2rescore adapter. You can also rescore technical replicates/MSruns/whatever separately of course |
I'm not sure what exactly the ida of the IDXML patch reader is but maybe the key error comes from discarding some psms here.. I must admit I am still working with ms2rescore 3.0.0, maybe there was something bug introduced in 3.0.3 |
To chime in;
Heavy isotopes are indeed not accounted for, but (as long as we are not talking deuterium; which makes a slight difference) retention times should still be the same (or at least very very similar) to the non-heavy-isotope tag. Biggest impact is going to come from the tag itself. DeepLC will make predictions for the "deisotoped" tag and also throw a warning.
I believe there was indeed something... Not sure if it was this specific version. Will nodge Arthur and Ralf this Monday. |
Ok nice that makes sense, and I was hoping for that, however, the tool reports |
This could be the PSMs that do not have search engine score as: #447 . Im more interested in this issue:
BTW, Im running only with comet now. |
No, those are a different warning! |
Interestingly they get removed, even when @timosachsenberg and myself thought they could be recovered by ms2rescore. |
Will have a look at this, will keep you posted. |
Hi all, Haven't look very in depth yet but to me it seems the peptides are just not parsed correctly. We use psm_utils for conversion and handeling of psms but it requires that the peptides are in proforma notation. So modifications (and also labels) have to be between square brackets and n-terminal modifications noted like this [TMT6plex]-TALFR. So I think peptides are just thrown out because the proforma notation is not correct. I'll look further into this next week! |
Ok, I have managed to run the experiment with SAGE and COMET with no problem. @daichengxin I think this is the CustomIDXML parser. |
"No problem" as in "no warnings at all"? I would be surprised if the handling/annotation of modifications in idxml changes between different search engines. |
Log file of one of the comet files
Looks like idXML has some issues even in Comet. |
Could you share the idXML files? so that we can figure out what happened |
Since PSMs are removed because of invalid proforma sequences, the problem will likely be somewhere in the parsing of the strings. Normally here https://github.com/compomics/psm_utils/blob/0ba376dbc59aafc1e00d10b6b4b734afba13b4cf/psm_utils/io/idxml.py#L154 round brackets are mapped to square brackets and then n-terminal and c-terminal modifications are handled, but since the sequence in the error still has round brackets somewhere something went wrong in parsing this. |
@jonasscheid, is this the workflow that updates idXML files with rescoring features? |
Can more info be provided on how MS²Rescore is used here? It doesn't seem to be through the normal CLI? Which versions of ms2rescore and psm_utils? The PSMs are parsed correctly into ms2rescore, otherwise it would crash earlier and would not run DeepLC. The problem arises while (or after) writing PSMs. I'm also not immediately sure where the |
Hi, @RalfG this is how are we using it: We have a small library to handle parameters, conversions to ms2rescore etc https://github.com/bigbio/quantms-rescoring. Here is the main function class: https://github.com/bigbio/quantms-rescoring/blob/main/quantmsrescore/ms2rescore.py. This approach allows us to integrate better the tool with our parameters. Here the versions we are using:
The error is from our library when it found a PSM that can't be processed. |
Ah, so these are the PSMs that MS²Rescore could not generate features for that are then listed when being processed by the wrapper script; which is not the issue at hand here? Seems like the actual exception that crashes the script, this one: Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/psm_utils/io/idxml.py", line 394, in _update_existing_ids
psms = psm_dict[None][run][peptide_id.getMetaValue("spectrum_reference")]
KeyError: 'controllerType=0 controllerNumber=1 scan=80368' is a run / spectrum_id mismatch between the idXML that is being updated and the PSMs in the @jonasscheid, if I understand the function Do let us know if there's something that should get fixed in MS²Rescore or psm_utils. |
Kind of, there seems to be a small patched version of the workflow going on here Why did you go for this None check? @ypriverol https://github.com/bigbio/quantms-rescoring/blob/70e1904b6ba07258327274045a8700f6cc3a18da/quantmsrescore/ms2rescore.py#L70 These ones look strange
This happens when one of the feature generators does not support the peptide sequence (as already discussed). Percolator needs the same features for all psms.
Indeed! (and very strange). Because there is a filter post-ms2rescore ( |
Because these psm don't have these search engines score. Then psm_utils will raise a error in here when parse idXML PSM https://github.com/compomics/psm_utils/blob/0ba376dbc59aafc1e00d10b6b4b734afba13b4cf/psm_utils/io/idxml.py#L49-L86
|
Error message: #447. So I add a patch for skipping these PSMs for now |
@daichengxin The search engine error is only for the msgf+ but it should be for Comet search. Let me put some possible ideas here: Comet and sequence patterns1- When running Comet (something @jonasscheid has tested a lot), we found this error:
Some of the sequence patterns are not working as @ArthurDeclercq mentioned. Would be good to have a comet file output for @ArthurDeclercq to test. MSGF+ and new adapterIt looks like the new adapter we introduced to fix error #447 is not handling a lot of sequences and spectrum IDs well. @daichengxin, Can you have a look at it? We may need to intercept other features from the PSMs.
Additional related error@jpfeuffer @timosachsenberg @daichengxin With the new OpenMS and the ms2rescore enabled, we have found a "bigger" problem where all peptides get removed by protein q-value, issue #459. I don't know if this is an error in the inference algorithm or a combination of issues. |
Bug fixed in #462 |
Description of the bug
Im running ms2rescore with the following command:
The step fails and here is the log file:
command.log.gz
Command used and terminal output
No response
Relevant files
No response
System information
No response
The text was updated successfully, but these errors were encountered: