- Table of Contents
- Name
- Purpose
- Part0 - Software Information and Data Storage
- Part1 - Quantification of TMT proteomics data (MaxQuant)
- Part2 - Preprocessing of quantification results (Persus)
- Reference
Proteomics Quantification MQ: Document the proteomics quantification pipeline based on MaxQuant for TMT labeled MS data
- Document the detailed methods to quantify proteomics data from raw TMT labeled MS data using softwares: MaxQuant and Persus.
- GBM cell line proteomics data was used as the example and mainly document for future replication of the analysis
The two main softwares used in the quantification of TMT labeled proteomics mass spectrometry data are MaxQuant and Persus. They are all downloaded from the offical website and installed on the WangLab Proteomics Windows Server.
The whole GBM cell lines proteomics data quantification work was finally done in October 25, 2022. The software versions are 2.1.4.0 for MaxQuant and 2.0.5.0 for Persus.
The data was stored on the Windows Server under the path: D:\jihong\projects\Proteomics_GBM\Cell_Line_TMT_102522
The folder contains the raw reference sequence fasta file and one subfolder called TMT_mzXML, all the raw MS data and quantification results are in the subfolder.
The three mzXML files with prefix as Mengli are the raw MS data from the BioCRF, and all the left data files are the middle and final results of MQ software.
The GBM cell line TMT labeled MS data was quantificated using MaxQuant, one well known proteomics analysis software with Graphic User Interface(GUI). Therefore, the overall analysis logic is:
- Load necessary files including raw MS data and reference fasta file into the software;
- Set the software analysis parameters for the specific MS type, TMT-10plex here;
- Select CPU cores for the analysis work and click the START button.
All the parameters including the path to loaded file are stored in one .xml file for further usage and replication. The setting files for this example work has been stored here, named as mqpar.xml.
Step1: Raw data could be loading into the software (Set the file path for raw data) by click the Load or Load folder button in the Raw data panel, as shown below.
Step2: Click the button Set experiment to set the experiment index parameter, since the MS raw data are obtained by one single MS run, the index for three files could be easily set as the same 1.
Step3: Click the button Set fractions to set the MS fraction parameter, since the three MS raw data files coming from three different LC fractions, the fraction index here should be set from 1 to 3.
We need to set the search parameters based on the MS labeling methods. Here, we need to set it as TMT-10plex.
Step1: Click the panel button Group-specific parameters to get into the parameters setting panel
Step2: Click the button Type, scroll down the top Type, and select Reporter ion MS2. This step is to set the search mode to MS2 based methods.
Step3: Click the button 10plex TMT.
All the other parameters could be left as default for simple usage. For some work specifically focusing on certain aims, other parameters could be changed under user's own demand.
The reference protein sequence library should be provided for the quantification work. The fasta file used for this example work is also stored here, named as UP000005640_9606_MGMT_100722.fasta. This protein reference library contains all the canonical protein downloaded from Uniprot and the specific MGMT isoform (user interesting one).
Step1: Click the panel button Global parameters to get into the parameters setting panel
Step2: Click the button Sequence, and then click the button Add to add the protein reference sequence file.
Ater all the files are loaded and all the parameters are setted, select core numbers on the left-down corner of the software and click the button Start.
After several hours' searching, the raw results will be stored in the subfolder combined. And the raw result files are under the txt subfolder.
Therefore, the final path of the MS quantification raw result files is D:\jihong\projects\Proteomics_GBM\Cell_Line_TMT_102522\TMT_mzXML\combined\txt
. The result file we could use for further preprocessing is called proteinGroups.txt, and it has also been stored here.
After we get the result proteinGroups.txt file, we could use different methods to preprocess the result, such as R, Excel, and the Persus software combined with MaxQuant.
The main aim of preprocessing is to filter out three kinds of wasted proteins in the results file:
- Only identified by site
- Reverse
- Potential contaminant
All the protein rows with these markers (+ in these three columns) should be removed from the raw results, and the columns with prefix Reporter intensity corrected are the final quantification reults.
Such filtering work can be easily done in Excel or R or Python for your own preference.
Persus can be also used to do such kind of work, and it may provide direct view of the data since it is one software with GUI. The parameter files for preprocessing the result have been also stored here, named as CellLine_TMT_1025.sps. In addition, the filtering pipeline could been seen as below.
- MaxQuant Documentation; Learning Video
- Persus Documentation; Learning Video