Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another error of quantms ms2rescore and OpenMS #458

Closed
ypriverol opened this issue Dec 6, 2024 · 26 comments · Fixed by #462
Closed

Another error of quantms ms2rescore and OpenMS #458

ypriverol opened this issue Dec 6, 2024 · 26 comments · Fixed by #462
Assignees
Labels
bug Something isn't working

Comments

@ypriverol
Copy link
Member

Description of the bug

Im running ms2rescore with the following command:

rescoring ms2rescore \
    --psm_file f04541_Prot_01_F04_msgf.idXML \
    --spectrum_path . \
    --ms2_tolerance 0.4 \
    --output_path f04541_Prot_01_F04_msgf_ms2rescore.idXML \
    --ms2pip_model_dir null \
    --processes 12 \
    --id_decoy_pattern ^DECOY_ \
    --ms2pip_model TMT --rescoring_engine 'percolator' --calibration_set_size 0.15 --test_fdr 0.05 --feature_generators deeplc,ms2pip \
    2>&1 | tee f04541_Prot_01_F04_ms2rescore.log

cat <<-END_VERSIONS > versions.yml
"NFCORE_QUANTMS:QUANTMS:TMT:ID:PSMRESCORING:MS2RESCORE":
    MS2Rescore: $(echo $(ms2rescore --version 2>&1) | grep -oP 'MS²Rescore \(v\K[^\)]+' )
END_VERSIONS

The step fails and here is the log file:
command.log.gz

Command used and terminal output

No response

Relevant files

No response

System information

No response

@ypriverol ypriverol added the bug Something isn't working label Dec 6, 2024
@jpfeuffer
Copy link
Collaborator

Looks like TMT modifications are not supported (by DeepLC at least). Both the missing labelled elements from the tag, as well as the discarded peptides with that modification hint on that.

Plus the error in the end. Which is probably either an implementation error or something wrong with merging or psmutils not handling merged idxmls.

@ypriverol
Copy link
Member Author

Well, DeepLC supports, in principle TMT, the other error probably @jonasscheid can help. Why merged idxmls? We run ms2rescore on top of msgf+ there.

@jpfeuffer
Copy link
Collaborator

jpfeuffer commented Dec 6, 2024

The warnings sound like it does not know elements with heavy isotopes so it is unlikely to predict peptides RTs with TMT mods. Maybe you are using a wrong version then, I don't know that tool too much.

The other error sounded like something from merging but yes, shouldn't be merged at that point:
psm_utils.io.idxml.IdXMLException: Multiple collections are not supported when parsing single pyopenms protein and peptide objects.

@jonasscheid
Copy link
Contributor

jonasscheid commented Dec 7, 2024

Why merged idxmls?

You often/always have technical replicates in your samplesheet right? The implementation of DeepLC in MS2Rescore allows RT calibration only once per group of MS runs. Now, it makes most sense to do the calibration on the group of technical replicates imo and thats why you can plug in a merged idxml (post-idmerger) into this ms2rescore adapter. You can also rescore technical replicates/MSruns/whatever separately of course

@jonasscheid
Copy link
Contributor

I'm not sure what exactly the ida of the IDXML patch reader is but maybe the key error comes from discarding some psms here.. I must admit I am still working with ms2rescore 3.0.0, maybe there was something bug introduced in 3.0.3

@RobbinBouwmeester
Copy link

To chime in;

The warnings sound like it does not know elements with heavy isotopes so it is unlikely to predict peptides RTs with TMT mods. Maybe you are using a wrong version then, I don't know that tool too much.

Heavy isotopes are indeed not accounted for, but (as long as we are not talking deuterium; which makes a slight difference) retention times should still be the same (or at least very very similar) to the non-heavy-isotope tag. Biggest impact is going to come from the tag itself. DeepLC will make predictions for the "deisotoped" tag and also throw a warning.

I'm not sure what exactly the ida of the IDXML patch reader is but maybe the key error comes from discarding some psms here.. I must admit I am still working with ms2rescore 3.0.0, maybe there was something bug introduced in 3.0.3

I believe there was indeed something... Not sure if it was this specific version. Will nodge Arthur and Ralf this Monday.

@jpfeuffer
Copy link
Collaborator

Heavy isotopes are indeed not accounted for, but (as long as we are not talking deuterium; which makes a slight difference) retention times should still be the same (or at least very very similar) to the non-heavy-isotope tag. Biggest impact is going to come from the tag itself. DeepLC will make predictions for the "deisotoped" tag and also throw a warning.

Ok nice that makes sense, and I was hoping for that, however, the tool reports
2024-12-06 19:23:55,720 INFO Removed 6663 PSMs. Peptides not supported: {'.(TMT6plex)VTLVYR', '.(TMT6plex)TALFLR', '.(TMT6plex)TLLVVR', ......} so this sounds more like they are actually skipped. Maybe you require a specific format/annotation of the tag?

@ypriverol
Copy link
Member Author

ypriverol commented Dec 7, 2024

This could be the PSMs that do not have search engine score as: #447 . Im more interested in this issue:

psm_utils.io.idxml.IdXMLException: Multiple collections are not supported when parsing single pyopenms protein and peptide objects.

BTW, Im running only with comet now.

@jpfeuffer
Copy link
Collaborator

No, those are a different warning!
2024-12-06 19:23:50,970 WARNING Removed 7057 PSMs that were missing one or more rescoring feature(s), {'iony_max_abs_diff_norm', ...

@ypriverol
Copy link
Member Author

No, those are a different warning! 2024-12-06 19:23:50,970 WARNING Removed 7057 PSMs that were missing one or more rescoring feature(s), {'iony_max_abs_diff_norm', ...

Interestingly they get removed, even when @timosachsenberg and myself thought they could be recovered by ms2rescore.

@RobbinBouwmeester
Copy link

Heavy isotopes are indeed not accounted for, but (as long as we are not talking deuterium; which makes a slight difference) retention times should still be the same (or at least very very similar) to the non-heavy-isotope tag. Biggest impact is going to come from the tag itself. DeepLC will make predictions for the "deisotoped" tag and also throw a warning.

Ok nice that makes sense, and I was hoping for that, however, the tool reports 2024-12-06 19:23:55,720 INFO Removed 6663 PSMs. Peptides not supported: {'.(TMT6plex)VTLVYR', '.(TMT6plex)TALFLR', '.(TMT6plex)TLLVVR', ......} so this sounds more like they are actually skipped. Maybe you require a specific format/annotation of the tag?

Will have a look at this, will keep you posted.

@ArthurDeclercq
Copy link

Hi all,

Haven't look very in depth yet but to me it seems the peptides are just not parsed correctly. We use psm_utils for conversion and handeling of psms but it requires that the peptides are in proforma notation. So modifications (and also labels) have to be between square brackets and n-terminal modifications noted like this [TMT6plex]-TALFR. So I think peptides are just thrown out because the proforma notation is not correct.

I'll look further into this next week!

@ypriverol
Copy link
Member Author

Ok, I have managed to run the experiment with SAGE and COMET with no problem. @daichengxin I think this is the CustomIDXML parser.

@jpfeuffer
Copy link
Collaborator

Ok, I have managed to run the experiment with SAGE and COMET with no problem. @daichengxin I think this is the CustomIDXML parser.

"No problem" as in "no warnings at all"? I would be surprised if the handling/annotation of modifications in idxml changes between different search engines.

@ypriverol
Copy link
Member Author

ypriverol commented Dec 7, 2024

Log file of one of the comet files

2024-12-06 15:26:31,756 WARNING Could not add the following atom: N[15], attempting to replace the [] part
2024-12-06 15:26:31,756 WARNING Could not add the following atom: C[13], attempting to replace the [] part
2024-12-06 15:26:31,756 WARNING Could not add the following atom: N[15], attempting to replace the [] part
2024-12-06 15:26:31,757 WARNING Could not add the following atom: C[13], attempting to replace the [] part
2024-12-06 15:26:31,757 WARNING Could not add the following atom: N[15], attempting to replace the [] part
2024-12-06 15:26:31,757 WARNING Could not add the following atom: C[13], attempting to replace the [] part
2024-12-06 15:26:31,757 WARNING Could not add the following atom: N[15], attempting to replace the [] part
2024-12-06 15:26:39,127 WARNING Removed 2 PSMs that were missing one or more rescoring feature(s), {'ionb_max_abs_diff', 'cos_iony_norm', 'ionb_abs_diff_Q2', 'spec_mse', 'ionb_std_abs_diff', 'dotprod_iony_norm', 'spec_mse_norm', 'iony_abs_diff_Q2_norm', 'dotprod_norm', 'iony_spearman', 'abs_diff_Q3', 'min_abs_diff_iontype', 'ionb_pearson_norm', 'abs_diff_Q1', 'dotprod', 'iony_std_abs_diff', 'std_abs_diff_norm', 'iony_max_abs_diff_norm', 'min_abs_diff', 'iony_abs_diff_Q1', 'ionb_pearson', 'dotprod_iony', 'mean_abs_diff', 'ionb_min_abs_diff_norm', 'iony_mean_abs_diff_norm', 'iony_mse_norm', 'ionb_abs_diff_Q3_norm', 'ionb_mean_abs_diff_norm', 'iony_pearson', 'iony_pearson_norm', 'iony_abs_diff_Q3', 'iony_abs_diff_Q3_norm', 'iony_min_abs_diff', 'cos_ionb', 'ionb_abs_diff_Q2_norm', 'spec_spearman', 'cos_iony', 'abs_diff_Q2', 'min_abs_diff_norm', 'ionb_abs_diff_Q3', 'ionb_max_abs_diff_norm', 'cos_ionb_norm', 'max_abs_diff', 'iony_min_abs_diff_norm', 'mean_abs_diff_norm', 'std_abs_diff', 'max_abs_diff_norm', 'ionb_mean_abs_diff', 'dotprod_ionb_norm', 'iony_abs_diff_Q1_norm', 'iony_mse', 'ionb_abs_diff_Q1', 'ionb_mse_norm', 'ionb_mse', 'abs_diff_Q1_norm', 'spec_pearson_norm', 'max_abs_diff_iontype', 'cos', 'ionb_abs_diff_Q1_norm', 'spec_pearson', 'abs_diff_Q2_norm', 'ionb_min_abs_diff', 'abs_diff_Q3_norm', 'dotprod_ionb', 'ionb_std_abs_diff_norm', 'ionb_spearman', 'iony_mean_abs_diff', 'cos_norm', 'iony_abs_diff_Q2', 'iony_std_abs_diff_norm', 'iony_max_abs_diff'}.
2024-12-06 15:26:39,237 INFO Writing added features to PIN file: f05401_Prot_07_F01_comet_ms2rescore.idXML.psms.pin
2024-12-06 15:26:42,151 INFO Removed 2 PSMs. Peptides not supported: {'.(TMT6plex)GFVVNLTGAUVC(Carbamidomethyl)SQ(Deamidated)K(TMT6plex)', '.(TMT6plex)MTUSC(Carbamidomethyl)LGFPNFPFSVLK(TMT6plex)'}

Looks like idXML has some issues even in Comet.

@daichengxin
Copy link
Collaborator

Could you share the idXML files? so that we can figure out what happened

@ArthurDeclercq
Copy link

Since PSMs are removed because of invalid proforma sequences, the problem will likely be somewhere in the parsing of the strings. Normally here https://github.com/compomics/psm_utils/blob/0ba376dbc59aafc1e00d10b6b4b734afba13b4cf/psm_utils/io/idxml.py#L154 round brackets are mapped to square brackets and then n-terminal and c-terminal modifications are handled, but since the sequence in the error still has round brackets somewhere something went wrong in parsing this.
If you could indeed share the idXML files I can have a look what goes wrong, because when I parse the strings separately the regex patterns should work.

@RalfG
Copy link

RalfG commented Dec 9, 2024

@jonasscheid, is this the workflow that updates idXML files with rescoring features?

@RalfG
Copy link

RalfG commented Dec 9, 2024

Can more info be provided on how MS²Rescore is used here? It doesn't seem to be through the normal CLI? Which versions of ms2rescore and psm_utils?

The PSMs are parsed correctly into ms2rescore, otherwise it would crash earlier and would not run DeepLC. The problem arises while (or after) writing PSMs.

I'm also not immediately sure where the Peptides not supported error is generated. Unless I'm mistaken, it does not come from ms2rescore or psm_utils.

@ypriverol
Copy link
Member Author

Hi, @RalfG this is how are we using it:

We have a small library to handle parameters, conversions to ms2rescore etc https://github.com/bigbio/quantms-rescoring. Here is the main function class: https://github.com/bigbio/quantms-rescoring/blob/main/quantmsrescore/ms2rescore.py. This approach allows us to integrate better the tool with our parameters.

Here the versions we are using:

click
pyopenms
ms2rescore==3.0.3
pandas
numpy
deepLC==2.2.38
psm-utils==0.8.3
scipy==1.13.1
pygam
protobuf==3.19.6

The error is from our library when it found a PSM that can't be processed.

@RalfG
Copy link

RalfG commented Dec 9, 2024

Ah, so these are the PSMs that MS²Rescore could not generate features for that are then listed when being processed by the wrapper script; which is not the issue at hand here?

Seems like the actual exception that crashes the script, this one:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/psm_utils/io/idxml.py", line 394, in _update_existing_ids
    psms = psm_dict[None][run][peptide_id.getMetaValue("spectrum_reference")]
KeyError: 'controllerType=0 controllerNumber=1 scan=80368'

is a run / spectrum_id mismatch between the idXML that is being updated and the PSMs in the PSMList. We can ignore the second exception message "Multiple collections are not supported when parsing single pyopenms protein and peptide objects.". It seems to be a bit too eager to blame the collection key for the KeyError. That should be updated.

@jonasscheid, if I understand the function _update_existing_ids correctly, it iterates over the peptide IDs in the idXML and for each gets the rescored PSM by run and spectrum ID. It seems like it can't find one of them, so either the spectrum ID is mismatched (changed somewhere?) or the there is no MS²Rescore PSM anymore for that idXML peptide ID (removed in the step above?).

Do let us know if there's something that should get fixed in MS²Rescore or psm_utils.

@jonasscheid
Copy link
Contributor

@jonasscheid, is this the workflow that updates idXML files with rescoring features?

Kind of, there seems to be a small patched version of the workflow going on here

Why did you go for this None check? @ypriverol https://github.com/bigbio/quantms-rescoring/blob/70e1904b6ba07258327274045a8700f6cc3a18da/quantmsrescore/ms2rescore.py#L70

These ones look strange
2024-12-06 19:20:11,241 WARNING Removed 913 PSMs without search engine features!

Interestingly they get removed, even when @timosachsenberg and myself thought they could be recovered by ms2rescore.

This happens when one of the feature generators does not support the peptide sequence (as already discussed). Percolator needs the same features for all psms.

@jonasscheid, if I understand the function _update_existing_ids correctly, it iterates over the peptide IDs in the idXML and for each gets the rescored PSM by run and spectrum ID. It seems like it can't find one of them, so either the spectrum ID is mismatched (changed somewhere?) or there is no MS²Rescore PSM anymore for that idXML peptide ID (removed in the step above?).

Indeed! (and very strange). Because there is a filter post-ms2rescore (filter_out_artifact_psms ) that removes peptide_id objects from the input idxml. Might be that there something is not 100% covered. Would be helpful if @ypriverol could share the idxml file for debugging :P

@daichengxin
Copy link
Collaborator

daichengxin commented Dec 10, 2024

@daichengxin
Copy link
Collaborator

daichengxin commented Dec 10, 2024

Error message: #447. So I add a patch for skipping these PSMs for now

@ypriverol
Copy link
Member Author

ypriverol commented Dec 10, 2024

@daichengxin The search engine error is only for the msgf+ but it should be for Comet search. Let me put some possible ideas here:

Comet and sequence patterns

1- When running Comet (something @jonasscheid has tested a lot), we found this error:

INFO Removed 2 PSMs. Peptides not supported: {'.(TMT6plex)GFVVNLTGAUVC(Carbamidomethyl)SQ(Deamidated)K(TMT6plex)', '.(TMT6plex)MTUSC(Carbamidomethyl)LGFPNFPFSVLK(TMT6plex)'}. 

Some of the sequence patterns are not working as @ArthurDeclercq mentioned. Would be good to have a comet file output for @ArthurDeclercq to test.

MSGF+ and new adapter

It looks like the new adapter we introduced to fix error #447 is not handling a lot of sequences and spectrum IDs well. @daichengxin, Can you have a look at it? We may need to intercept other features from the PSMs.

psm_utils.io.idxml.IdXMLException: Multiple collections are not supported when parsing single pyopenms protein and peptide objects.

Additional related error

@jpfeuffer @timosachsenberg @daichengxin With the new OpenMS and the ms2rescore enabled, we have found a "bigger" problem where all peptides get removed by protein q-value, issue #459. I don't know if this is an error in the inference algorithm or a combination of issues.

@ypriverol ypriverol linked a pull request Dec 20, 2024 that will close this issue
11 tasks
@ypriverol
Copy link
Member Author

Bug fixed in #462

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants