Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
vdemichev authored Jul 19, 2024
1 parent 194763d commit f5983dd
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -653,6 +653,9 @@ In terms of computational reproducibility, please note the following:
**Q: Do I need imputation?**
**A:** For many downstream applications - no. Most statistical tests are fine handling missing values. When imputation is necessary (e.g. for certain kinds of machine learning, or when a protein is completely absent in some of the biological conditions), we prefer to perform it on the protein level. Note that many papers that discuss imputation methods for proteomic data benchmark them on DDA data. The fundamental difference between DDA and DIA is that in DIA if a value is missing it is a lot more likely that the respective precursor/protein is indeed quite low-abundant. Because of this, minimal value imputation, for example, performs better for DIA than it does for DDA. An important consideration with imputation is that Gaussian statistical methods (like t-test) should be used with caution on imputed data, if the latter is strongly non-Gaussian.

**Q: Can I obtain the quantities of the individual fragments?**
**A:** You can use the --export-quant option to instruct DIA-NN to save fragment peak heights (non-normalised) to the .parquet report, along with quality scores for the respective extracted elution profiles. This allows to devise custom quantification strategies. Note that in that case you can still leverage the normalisation factor calculated by DIA-NN for each precursor identification.

**Q: Can I implement incremental data processing of large-scale experiments based on DIA-NN?**
**A:** Yes! Sometimes it is important to have an ability to analyse the data incrementally, e.g. first process one cohort of patients, then in a half a year samples from another cohort arrive, and it would be great to merge the data but keep all protein quantities for the old samples unchanged. This is fully supported by DIA-NN (with the 'legacy' quantification mode), and there are several options on how to implement this.
- Option 1. Process each batch separately, then do batch correction. This is the easiest way and might actually work quite well in some cases.
Expand Down

0 comments on commit f5983dd

Please sign in to comment.