Skip to content

Latest commit

 

History

History
504 lines (402 loc) · 15.6 KB

160108.md

File metadata and controls

504 lines (402 loc) · 15.6 KB
  • Chap.7 Analyzing Mass Spectrometry Data
----[p.195]-------------------------------------------------------------------

Reading the MS data of the mzXML format

>install.packages("readMzXmlData")
>install.packages("MALDIquant")
>install.packages("MALDIquantForeign")

>library(readMzXmlData)
>library(MALDIquant)
>library(MALDIquantForeign)

7MIX_STD_110802_1.mzxmlを準備
(https://github.com/ekaminuma/BioinfoTraining15/tree/master/data参照)

> myData<- importMzXml(path="7MIX_STD_110802_1.mzxml")


> class(myData)
[1] "list"
> 
> length(myData)
[1] 7143
> 
> head(summary(myData))
     Length Class          Mode
[1,] " 705" "MassSpectrum" "S4"
[2,] "  74" "MassSpectrum" "S4"
[3,] "  91" "MassSpectrum" "S4"
[4,] " 686" "MassSpectrum" "S4"
[5,] "  44" "MassSpectrum" "S4"
[6,] "  92" "MassSpectrum" "S4"
> 
> 
> dim(summary(myData))
[1] 7143    3
> 
> 
> head(str(myData[[1]]))
Formal class 'MassSpectrum' [package "MALDIquant"] with 3 slots
  ..@ mass     : num [1:705] 401 402 403 404 406 ...
  ..@ intensity: num [1:705] 722826 95972 1126969 607818 504474 ...
  ..@ metaData :List of 20
  .. ..$ file             : chr "C:\\Users\\ekaminuma\\OneDrive\\ドキュメント\\7MIX_STD_110802_1.mzxml"
  .. ..$ scanCount        : num 7161
  .. ..$ startTime        : num 0.00683
  .. ..$ endTime          : num 200
  .. ..$ parentFile       :List of 1
  .. .. ..$ :List of 3
  .. .. .. ..$ fileName: chr "file://Rdf3/data2/search/ppatrick/sashimi_repository/LCQ/7MIX_STD_110802_1.RAW"
  .. .. .. ..$ fileType: chr "RAWData"
  .. .. .. ..$ fileSha1: chr "957f3baf650d4de3d87c04a9fc64baa13f6b363e"
  .. ..$ msInstrument     :List of 6
  .. .. ..$ msManufacturer: chr "ThermoFinnigan"
  .. .. ..$ msModel       : chr "LCQ Deca XP"
  .. .. ..$ msIonisation  : chr "ESI"
  .. .. ..$ msMassAnalyzer: chr "Ion Trap"
  .. .. ..$ msDetector    : chr "EMT"
  .. .. ..$ software      :List of 3
  .. .. .. ..$ type   : chr "acquisition"
  .. .. .. ..$ name   : chr "Xcalibur"
  .. .. .. ..$ version: chr "1.3 SP 1"
  .. ..$ dataProcessing   :List of 2
  .. .. ..$ centroided: num 1
  .. .. ..$ software  :List of 1
  .. .. .. ..$ :List of 3
  .. .. .. .. ..$ type   : chr "conversion"
  .. .. .. .. ..$ name   : chr "Thermo2mzXML"
  .. .. .. .. ..$ version: chr "1"
  .. ..$ num              : num 1
  .. ..$ peaksCount       : num 705
  .. ..$ msLevel          : num 1
  .. ..$ polarity         : chr "+"
  .. ..$ lowMz            : num 400
  .. ..$ highMz           : num 1500
  .. ..$ basePeakMz       : num 1222
  .. ..$ basePeakIntensity: num 6009200
  .. ..$ totIonCurrent    : num 2.7e+08
  .. ..$ retentionTime    : num 0.41
  .. ..$ precision        : num 32
  .. ..$ byteOrder        : chr "network"
  .. ..$ pairOrder        : chr "m/z-int"
<略>

"mqReadMzXml(Package: readMzXmlData)"は、Rの最新バージョンでは使えないために
9-11は省力。

plot(myData[[1]], col="black")



---------------[p.198, 9-11]---------------------------------------------------------------

> source("http://Bioconductor.org/biocLite.R")
Bioconductor version 3.1 (BiocInstaller 1.18.5), ?biocLite for help
A newer version of Bioconductor is available for this version of R, ?BiocUpgrade for
  help
> 
> 
> biocLite("mzR")
BioC_mirror: http://bioconductor.org
Using Bioconductor version 3.1 (BiocInstaller 1.18.5), R version 3.2.2.
Installing package(s) ‘mzR’
also installing the dependency ‘ProtGenerics’

trying URL 'http://bioconductor.org/packages/3.1/bioc/bin/windows/contrib/3.2/ProtGenerics_1.0.0.zip'
Content type 'application/zip' length 60321 bytes (58 KB)
downloaded 58 KB

trying URL 'http://bioconductor.org/packages/3.1/bioc/bin/windows/contrib/3.2/mzR_2.2.2.zip'
Content type 'application/zip' length 8382758 bytes (8.0 MB)
downloaded 8.0 MB

package ‘ProtGenerics’ successfully unpacked and MD5 sums checked
package ‘mzR’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\ekaminuma\AppData\Local\Temp\RtmpumiKuy\downloaded_packages
Old packages: 'ape', 'ggplot2', 'jsonlite', 'manipulate', 'MKmisc', 'protr', 'class',
  'foreign', 'MASS', 'Matrix', 'mgcv', 'nlme', 'nnet', 'spatial'
Update all/some/none? [a/s/n]: 
n
> 
> library(mzR)
 要求されたパッケージ Rcpp をロード中です 
Warning message:
In fun(libname, pkgname) :
  mzR has been built against a different Rcpp version (0.12.1)
than is installed on your system (0.12.2). This might lead to errors
when loading mzR. If you encounter such issues, please send a report,
including the output of sessionInfo() to the Bioc support forum at 
https://support.bioconductor.org/. For details see also
https://github.com/sneumann/mzR/wiki/mzR-Rcpp-compiler-linker-issue.
> 
> myData2 <- openMSfile("tiny.pwiz.1.1.mzml")
> 
> myData2 
Mass Spectrometry file handle.
Filename:  tiny.pwiz.1.1.mzml 
Number of scans:  0 
> 
> 
> hd <- header(myData2)
> 
> dim(hd)
[1]  0 21
> 
> names(hd)
 [1] "seqNum"                   "acquisitionNum"           "msLevel"                 
 [4] "polarity"                 "peaksCount"               "totIonCurrent"           
 [7] "retentionTime"            "basePeakMZ"               "basePeakIntensity"       
[10] "collisionEnergy"          "ionisationEnergy"         "lowMZ"                   
[13] "highMZ"                   "precursorScanNum"         "precursorMZ"             
[16] "precursorCharge"          "precursorIntensity"       "mergedScan"              
[19] "mergedResultScanNum"      "mergedResultStartScanNum" "mergedResultEndScanNum"  
> 

-----[p.201]----------------------------------------------------------------------------

Reading the MS data of the Bruker format


> install.packages("readBrukerFlexData")
Error in install.packages : Updating loaded packages
> install.packages("readBrukerFlexData")
Installing package into ‘C:/Users/ekaminuma/Documents/R/win-library/3.2’
(as ‘lib’ is unspecified)
trying URL 'http://cran.rstudio.com/bin/windows/contrib/3.2/readBrukerFlexData_1.8.2.zip'
Content type 'application/zip' length 358063 bytes (349 KB)
downloaded 349 KB

package ‘readBrukerFlexData’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\ekaminuma\AppData\Local\Temp\RtmpumiKuy\downloaded_packages
> 
> library(readBrukerFlexData)
Warning message:
 パッケージ ‘readBrukerFlexData’ はバージョン 3.2.3 の R の下で造られました  
> 
> 
> exampleDirectory <- system.file("Examples",package = "readBrukerFlexData")
> 
> exampleDirectory
[1] "C:/Users/ekaminuma/Documents/R/win-library/3.2/readBrukerFlexData/Examples"
> 
> myData_Bruker <- readBrukerFlexDir(file.path(exampleDirectory))
Warning message:
In readBrukerFlexFile(fidFile = f, removeMetaData = removeMetaData,  :
  The spectrum file ‘C:\Users\ekaminuma\Documents\R\win-library\3.2\readBrukerFlexData\Examples\hpc\fid\0_A20\1\1SRef\fid’ uses HPC. HPC isn't fully supported by readBrukerFlexFile. Please see “?.hpc” for details.
> 
> 
> summary(myData_Bruker)
                                               Length Class  Mode
s2010_05_19_Gibb_C8_A1.A1.T_0209513_0020253_98 2      -none- list
s2010_05_19_Gibb_C8_A1.A2.T_0209513_0020253_98 2      -none- list
sfid.A20.T_0209520_0020303_0                   2      -none- list
> 
> 

> str(myData_Bruker)
List of 3
 $ s2010_05_19_Gibb_C8_A1.A1.T_0209513_0020253_98:List of 2
  ..$ spectrum:List of 3
  .. ..$ tof      : num [1:22431] 21021 21023 21025 21027 21029 ...
  .. ..$ mass     : num [1:22431] 1000 1000 1000 1001 1001 ...
  .. ..$ intensity: num [1:22431] 11278 11350 10879 10684 10740 ...
  ..$ metaData:List of 47
  .. ..$ byteOrder              : chr "little"
  .. ..$ number                 : num 22431
  .. ..$ timeDelay              : num 21021
  .. ..$ timeDelta              : num 2
  .. ..$ calibrationConstants   : Named num [1:3] 2.32e+06 2.74e+02 -1.30e-03
  .. .. ..- attr(*, "names")= chr [1:3] "c1" "c2" "c3"
  .. ..$ hpcLimits              : Named num [1:2] 0 0
  .. .. ..- attr(*, "names")= chr [1:2] "minMass" "maxMass"
  .. ..$ hpcOrder               : num 0
  .. ..$ hpcUse                 : logi FALSE

注:遺伝研共通機器: 質量分析装置(NIG構造棟1Fにある)は、Bruker形式ファイルが出てくる。




-----[p.203]-----------------------------------------

Converting the MS data in the mzXML format to MALDquant

> library(MALDIquantForeign)
> 
> exampleDirectory <- system.file("exampledata",package="MALDIquantForeign")
> 
> exampleDirectory
[1] "C:/Users/ekaminuma/Documents/R/win-library/3.2/MALDIquantForeign/exampledata"
> 
> myData_MALDI <- import(exampleDirectory, type="auto")
3 files of type=‘mzxml’ found.
Reading spectrum from ‘C:\Users\ekaminuma\Documents\R\win-library\3.2\MALDIquantForeign\exampledata\tiny1-centroided.mzXML3.0.mzXML’ ...
Found mzXML document (version: 3).
Processing scan 1/1 (msLevel: 1, number of peaks: 5) ...
Look for '<sha1>' positions ...
Calculate sha1-sum (1/1) for ‘C:\Users\ekaminuma\Documents\R\win-library\3.2\MALDIquantForeign\exampledata\tiny1-centroided.mzXML3.0.mzXML’: 1cf2b3695447d2dd698b465e27f452a80192dd62
Warning in .createMassObject(mass = x$spectrum$mass, intensity = x$spectrum$intensity,  :
<略> 
> 
> class(myData_MALDI[[1]])
[1] "MassSpectrum"
attr(,"package")
[1] "MALDIquant"

> str(myData_MALDI[[1]])
Formal class 'MassSpectrum' [package "MALDIquant"] with 3 slots
  ..@ mass     : num [1:5] 1 2 3 4 5
  ..@ intensity: num [1:5] 6 7 8 9 10
  ..@ metaData :List of 18
  .. ..$ file           : chr "C:\\Users\\ekaminuma\\Documents\\R\\win-library\\3.2\\MALDIquantForeign\\exampledata\\tiny1-centroided.mzXML3.0.mzXML"
  .. ..$ scanCount      : num 1
  .. ..$ parentFile     :List of 1
  .. .. ..$ :List of 3
  .. .. .. ..$ fileName: chr "file://tiny.RAW"
  .. .. .. ..$ fileType: chr "RAW"
  .. .. .. ..$ fileSha1: chr "0000000000000000000000000000000000000000"
  .. ..$ msInstrument   :List of 6
  .. .. ..$ msManufacturer: chr "FooBar"
  .. .. ..$ msModel       : chr "FooBar Model1"
  .. .. ..$ msIonisation  : chr "MALDI"
  .. .. ..$ msMassAnalyzer: chr "TOF"
  .. .. ..$ msDetector    : chr "msD"
  .. .. ..$ software      :List of 3
  .. .. .. ..$ type   : chr "acquisition"
  .. .. .. ..$ name   : chr "AcquisitionSoftware"
  .. .. .. ..$ version: chr "1.0.0"
  .. ..$ num            : num 1
  .. ..$ peaksCount     : num 5
  .. ..$ msLevel        : num 1
  .. ..$ polarity       : chr "+"
  .. ..$ scanType       : chr "Full"
  .. ..$ centroided     : num 1
  .. ..$ lowMz          : num 1
  .. ..$ highMz         : num 5
  .. ..$ totIonCurrent  : num 40
  .. ..$ precision      : num 64
  .. ..$ byteOrder      : chr "network"
  .. ..$ contentType    : chr "m/z-int"
  .. ..$ compressionType: chr "none"
  .. ..$ compressedLen  : num 0
<略>

-----[p.205]-----------------------------------------

Extracting data elements from the MS data object

>library(MALDIquant)

===データの読込み==================

> f2009 <- data("fiedler2009subset",package="MALDIquant")
> 
> summary(f2009)
   Length     Class      Mode 
        1 character character 
> 


> tmp1 <- fiedler2009subset[[1]] 16要素のうち1番目要素だけ解析に利用

==クラスの確認=============

> class(tmp1)
[1] "MassSpectrum"
attr(,"package")
[1] "MALDIquant"

==データ構造の確認============
> str(tmp1)
Formal class 'MassSpectrum' [package "MALDIquant"] with 3 slots
  ..@ mass     : num [1:42388] 1000 1000 1000 1000 1000 ...
  ..@ intensity: int [1:42388] 3149 3134 3127 3131 3170 3162 3162 3152 3174 3137 ...
  ..@ metaData :List of 42
  .. ..$ byteOrder           : chr "little"
  .. ..$ number              : num 42388
  .. ..$ timeDelay           : num 19886
  .. ..$ timeDelta           : num 1
  .. ..$ calibrationConstants: Named num [1:3] 2.60e+06 2.68e+02 -4.43e-03
  .. .. ..- attr(*, "names")= chr [1:3] "c1" "c2" "c3"
  .. ..$ hpcLimits           : Named num [1:2] 0 0
<略>

=== m/z比の抽出================

> m_z <- mass(tmp1)
> 
> head(m_z)
[1] 1000.015 1000.117 1000.219 1000.321 1000.423 1000.525
> 



> m_z2 <- tmp1@mass

> summary(m_z2)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1000    2373    4330    4720    6872   10000 
 
> head(m_z2)
[1] 1000.015 1000.117 1000.219 1000.321 1000.423 1000.525


====Intensityの抽出===============

> m_int <- intensity(tmp1)
> 
> head(m_int)
[1] 3149 3134 3127 3131 3170 3162
> 
> 
> m_int2 <- tmp1@intensity
> 
> head(m_int2)
[1] 3149 3134 3127 3131 3170 3162
> summary(m_int)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      5     282     923    2131    3033  101800 
> 


> metaData(tmp1)
$byteOrder
[1] "little"

$number
[1] 42388

$timeDelay
[1] 19886

$timeDelta
[1] 1

$calibrationConstants
          c1           c2           c3 
 2.59729e+06  2.68443e+02 -4.43352e-03 

$hpcLimits
minMass maxMass 
<略>


> plot(m_z,m_int,type="l")






-------[p.207 ]----------------------------------

Preprocessing MS data

> library(MALDIquant)
> f2009 <- data("fielder2009subset")
> tmp1 <- fiedler2009subset[[1]]

ここまで前項目「element extraction」と同じ。

=====強度のtransformation=====================

> spec1 <- transformIntensity(tmp1, method="sqrt")
> 
> spec1
S4 class type            : MassSpectrum                
Number of m/z values     : 42388                       
Range of m/z values      : 1000.015 - 9999.734         
Range of intensity values: 2.236e+00 - 3.191e+02       
Memory usage             : 670.945 KiB                 
Name                     : Pankreas_HB_L_061019_G10.M19
File                     : /data/set A - discovery leipzig/control/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid


> spec2 <- smoothIntensity(spec1,method="MovingAverage")
>             移動平均で平滑化
> spec2
S4 class type            : MassSpectrum                
Number of m/z values     : 42388                       
Range of m/z values      : 1000.015 - 9999.734         
Range of intensity values: 2.783e+00 - 3.19e+02        
Memory usage             : 670.945 KiB                 
Name                     : Pankreas_HB_L_061019_G10.M19
File                     : /data/set A - discovery leipzig/control/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid

> spec3 <- removeBaseline(spec2)  ベースライン補正
> 
> spec3
S4 class type            : MassSpectrum                
Number of m/z values     : 42388                       
Range of m/z values      : 1000.015 - 9999.734         
Range of intensity values: 0e+00 - 2.463e+02           
Memory usage             : 670.945 KiB                 
Name                     : Pankreas_HB_L_061019_G10.M19
File                     : /data/set A - discovery leipzig/control/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid
 



> spec4 <- calibrateIntensity(spec3, method="TIC")  
            データキャリブレーション
> spec4
S4 class type            : MassSpectrum                
Number of m/z values     : 42388                       
Range of m/z values      : 1000.015 - 9999.734         
Range of intensity values: 0e+00 - 5.625e-03           
Memory usage             : 670.945 KiB                 
Name                     : Pankreas_HB_L_061019_G10.M19
File                     : /data/set A - discovery leipzig/control/Pankreas_HB_L_061019_G10/0_m19/1/1SLin/fid
 

> out_n <- estimateNoise(spec4)
> 
> head(out_n)
         mass   intensity
[1,] 1000.015 3.77133e-05
[2,] 1000.117 3.77133e-05
[3,] 1000.219 3.77133e-05
[4,] 1000.321 3.77133e-05
[5,] 1000.423 3.77133e-05
[6,] 1000.525 3.77133e-05

<中略>

[4999,] 1574.657 3.77133e-05
[5000,] 1574.785 3.77133e-05
[ reached getOption("max.print") -- omitted 37388 rows ]



> png("spec1to4.png",640,960)
> par(mfrow=c(4,1),ps=18)
> 
> plot(spec1@mass,spec1@intensity,type="l")
> plot(spec2@mass,spec2@intensity,type="l")
> plot(spec3@mass,spec3@intensity,type="l")
> plot(spec4@mass,spec4@intensity,type="l")
> dev.off()
RStudioGD 
        2