Skip to content

Latest commit

 

History

History
543 lines (431 loc) · 22.4 KB

151027.md

File metadata and controls

543 lines (431 loc) · 22.4 KB
--------[p.48]-------------------------------------------------------------
GO enrichment of genes

<オントロジ割当のEnrichment解析とは>

The GO enrichment of genes

Enrichment analysis(またはHypergeometric test)で、発現遺伝子のセットがどのGOにあてはまるかを統計的にテストします。

<Enrichment解析の説明>
まずゲノムワイドな遺伝子セットについてGOのTerm別で分布をカウントしておきます。知りたいクエリの遺伝子セット(例:Microarray解析でのUp-regulated genesセット)について、GO Termの分布をカウントします。GOTerm毎に、ゲノムワイドな遺伝子セット(background dataset)と比較して有意に多いかP値で表します。分布の検定には、超幾何分布検定(hypergeometric test)もしくはFisherの正確確率検定(Fisher's exact test)が使われます。



--------------[スライド中のfisher’s exact test実行]----------------------------------
参考:
http://jura.wi.mit.edu/bio/education/hot_topics/enrichment/Gene_list_enrichment_Mar10.pdf

> counts = (matrix(data = c(3, 297, 40, 19960), nrow = 2))
> counts
     [,1]  [,2]
[1,]    3    40
[2,]  297 19960

> out<-fisher.test(counts)
> out

	Fisher's Exact Test for Count Data

data:  counts
p-value = 0.02552
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
  0.9918169 15.9604612
sample estimates:
odds ratio 
  5.039206 

-----[p.48]--------------------------------------


> source("http://Bioconductor.org/biocLite.R")
Bioconductor version 3.1 (BiocInstaller 1.18.4), ?biocLite for help
> biocLite(c("topGO","ALL"))
trying URL 'http://bioconductor.org/packages/3.1/bioc/bin/windows/contrib/3.2/BiocInstaller_1.18.5.zip'
Content type 'application/zip' length 56465 bytes (55 KB)
downloaded 55 KB

> library(topGO)
 要求されたパッケージ BiocGenerics をロード中です 
 要求されたパッケージ parallel をロード中です 

<略>

> library(ALL)
> data(ALL)
> data(geneList)
> summary(ALL)
       Length         Class          Mode 
            1 ExpressionSet            S4 
> summary(geneList)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.00000 0.09917 0.62240 0.55320 1.00000 1.00000 
> dim(geneList)
NULL
> head(geneList)
1095_s_at   1130_at   1196_at 1329_s_at 1340_s_at 1342_g_at 
1.0000000 1.0000000 0.6223795 0.5412240 1.0000000 1.0000000 
> dim(ALL)
Features  Samples 
   12625      128 
> head(ALL)
ExpressionSet (storageMode: lockedEnvironment)
assayData: 1 features, 128 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: 01005 01010 ... LAL4 (128 total)
  varLabels: cod diagnosis ... date last seen (21 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
  pubMedIds: 14684422 16243790 
Annotation: hgu95av2 


>  affyname <- paste(annotation(ALL), "db", sep=".")

> biocLite(affyname) #本と異なるので注意
BioC_mirror: http://bioconductor.org
Using Bioconductor version 3.1 (BiocInstaller 1.18.5), R version 3.2.2.
Installing package(s) ‘hgu95av2.db’
installing the source package ‘hgu95av2.db’
<略>


> topDiffGenes(geneList)
 1095_s_at    1130_at    1196_at  1329_s_at  1340_s_at  1342_g_at    1347_at    1361_at 
     FALSE      FALSE      FALSE      FALSE      FALSE      FALSE       TRUE      FALSE 

<略>

  506_s_at     563_at     989_at 
      TRUE      FALSE       TRUE 
> sum(topDiffGenes(geneList))
[1] 50


> myGO <- new("topGOdata",ontology="BP",allGenes=geneList,geneSel=topDiffGenes,nodeSize=10,annot=annFUN.db,affyLib=affyname)

Building most specific GOs .....
Loading required package: hgu95av2.db
Loading required package: org.Hs.eg.db


	( 1376 GO terms found. )

Build GO DAG topology ..........	( 3902 GO terms and 9217 relations. )

Annotating nodes ...............	( 310 genes annotated to the GO terms. )



 
> out1 <- runTest(myGO,algorithm="classic",statistic="fisher")

			 -- Classic Algorithm -- 

		 the algorithm is scoring 938 nontrivial nodes
		 parameters: 
			 test statistic:  fisher 
> out1

Description:  
Ontology: BP 
'classic' algorithm with the 'fisher' test
1015 GO terms scored: 72 terms with p < 0.01
Annotation data:
    Annotated genes: 310 
    Significant genes: 46 
    Min. no. of genes annotated to a GO: 10 
    Nontrivial nodes: 938 

> summary(out1)
     Length       Class        Mode 
          1 topGOresult          S4 


> score(out1)
  GO:0000003   GO:0000070   GO:0000075   GO:0000077   GO:0000079   GO:0000082   GO:0000086 
0.6869641900 0.9971382112 0.9439362205 0.6004677549 0.8053541242 0.9640055629 0.8222440022 
<略>


> summary(out1)
     Length       Class        Mode 
          1 topGOresult          S4 

> out2 <- runTest(myGO,algorithm="classic",statistic="ks")

			 -- Classic Algorithm -- 

		 the algorithm is scoring 1015 nontrivial nodes
		 parameters: 
			 test statistic:  ks 
			 score order:  increasing 
> 
> out2

Description:  
Ontology: BP 
'classic' algorithm with the 'ks' test
1015 GO terms scored: 87 terms with p < 0.01
Annotation data:
    Annotated genes: 310 
    Significant genes: 46 
    Min. no. of genes annotated to a GO: 10 
    Nontrivial nodes: 1015 



> tab1 <- GenTable(myGO, classifFisher=out1,topNodes=20)
> 
> head(tab1)
       GO.ID                                        Term Annotated Significant Expected
1 GO:0002252                     immune effector process        33          13     4.90
2 GO:0001909             leukocyte mediated cytotoxicity        11           7     1.63
3 GO:0032946 positive regulation of mononuclear cell ...        34          13     5.05
4 GO:0050671 positive regulation of lymphocyte prolif...        34          13     5.05
5 GO:0070665 positive regulation of leukocyte prolife...        34          13     5.05
6 GO:0042102 positive regulation of T cell proliferat...        31          12     4.60
  classifFisher
1       0.00021
2       0.00021
3       0.00030
4       0.00030
5       0.00030
6       0.00048




> tab2 <- GenTable(myGO, classicKS=out2,topNodes=20)
> head(tab2)
       GO.ID                       Term Annotated Significant Expected classicKS
1 GO:0007049                 cell cycle       197          27    29.23   5.4e-11
2 GO:0022402         cell cycle process       184          23    27.30   6.9e-10
3 GO:1903047 mitotic cell cycle process       177          19    26.26   1.1e-09
4 GO:0000278         mitotic cell cycle       182          22    27.01   1.2e-09
5 GO:0000280           nuclear division       175          17    25.97   1.9e-09
6 GO:0007067   mitotic nuclear division       175          17    25.97   1.9e-09


> tab_all<-GenTable(myGO, cl_Fisher=out1,cl_KS=out2,ranksOf = "cl_Fisher",topNodes=10)
> 
> tab_all
        GO.ID                                        Term Annotated Significant Expected
1  GO:0002252                     immune effector process        33          13     4.90
2  GO:0001909             leukocyte mediated cytotoxicity        11           7     1.63
3  GO:0032946 positive regulation of mononuclear cell ...        34          13     5.05
4  GO:0050671 positive regulation of lymphocyte prolif...        34          13     5.05
5  GO:0070665 positive regulation of leukocyte prolife...        34          13     5.05
6  GO:0042102 positive regulation of T cell proliferat...        31          12     4.60
7  GO:0030098                  lymphocyte differentiation        23          10     3.41
8  GO:0030217                      T cell differentiation        23          10     3.41
9  GO:0050870    positive regulation of T cell activation        32          12     4.75
10 GO:0032943              mononuclear cell proliferation        37          13     5.49
   Rank in cl_Fisher cl_Fisher  cl_KS
1                  1   0.00021 0.0029
2                  2   0.00021 0.0024
3                  3   0.00030 0.0084
4                  4   0.00030 0.0084
5                  5   0.00030 0.0084
6                  6   0.00048 0.0114
7                  7   0.00052 0.0028
8                  8   0.00052 0.0028
9                  9   0.00068 0.0149
10                10   0.00081 0.0177



> gostat<- termStat(myGO, names(score(out1)))
> head(gostat)
           Annotated Significant Expected
GO:0000003        59           8     8.75
GO:0000070        58           3     8.61
GO:0000075        55           5     8.16
GO:0000077        13           2     1.93
GO:0000079        19           2     2.82
GO:0000082        20           1     2.97
> 

> plot(score(out1),score(out2))


> head(gostat$Annotated)
[1] 59 58 55 13 19 20

> val<-gostat$Annotated 分かり易く変数代入

> plot(score(out1),score(out2),cex=4*(val/max(val)))

> plot(score(out1),score(out2),cex=4*(val/max(val)),
col=heat.colors(gostat$Significant)) 有意な程scoreを赤色に 






> biocLite("Rgraphviz")   ネットワーク図用のグラフライブラリ
BioC_mirror: http://bioconductor.org
Using Bioconductor version 3.1 (BiocInstaller 1.18.5), R version
  3.2.2.
Installing package(s) ‘Rgraphviz’
trying URL 'http://bioconductor.org/packages/3.1/bioc/bin/windows/contrib/3.2/Rgraphviz_2.12.0.zip'
Content type 'application/zip' length 3663538 bytes (3.5 MB)
downloaded 3.5 MB

package ‘Rgraphviz’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\ekaminuma\AppData\Local\Temp\RtmpYvq03d\downloaded_packages
Old packages: 'doParallel', 'foreach', 'iterators', 'manipulate',
  'snow', 'stringi', 'class', 'foreign', 'MASS', 'mgcv', 'nlme',
  'nnet', 'spatial'
Update all/some/none? [a/s/n]: 
n

> library(Rgraphviz)


> showSigOfNodes(myGO,score(out1),firstSigNodes=3) トップから3階層目まで表示
Loading required package: Rgraphviz
Loading required package: grid

Attaching package: ‘grid’

 以下のオブジェクトは ‘package:topGO’ からマスクされています: 

     depth 

$dag
A graphNEL graph with directed edges
Number of Nodes = 22 
Number of Edges = 36 

$complete.dag
[1] "A graph with 22 nodes."


GOnetwork

----------[p.52]----------------------------------------------------------------------

> source("http://Bioconductor.org/biocLite.R")
Bioconductor version 3.1 (BiocInstaller 1.18.4), ?biocLite for help

> biocLite("clusterProfiler") BioC_mirror: http://bioconductor.org Using Bioconductor version 3.1 (BiocInstaller 1.18.4), R version 3.2.2. Installing package(s) ‘clusterProfiler’ also installing the dependencies ‘xtable’, ‘iterators略

> library(clusterProfiler)
> data(gcSample)
> genes<-gcSample[[3]]
> genes
  [1] "894"       "7057"      "22906"     "3339"      "10449"    
略
[386] "26137"     "4626"      "54629"     "91860"     "57707"    
[391] "388969"    "11057"    
> summary(genes)
   Length     Class      Mode 
      392 character character 

----------Enrichment解析によるPathwayランキング-----------------


> kegg_en<-enrichKEGG(gene=genes,organism="human",pvalueCutoff=0.05,qvalueCutoff=0.05,readable=T)   本と違うので注意


> kegg_en
392 human Genes to  KEGG  test for over-representation. 
 with p value < 0.05 
> 
> kegg_f <- summary(kegg_en)
> kegg_f
               ID                            Description GeneRatio  BgRatio       pvalue
hsa04512 hsa04512               ECM-receptor interaction     9/163  87/6969 0.0001806948
hsa04060 hsa04060 Cytokine-cytokine receptor interaction    16/163 265/6969 0.0004420219
           p.adjust     qvalue
hsa04512 0.03957216 0.03651937
hsa04060 0.04840139 0.04466747
                                                                                                          geneID
hsa04512                                                  THBS1/HSPG2/ITGB7/CHAD/ITGA7/LAMA4/ITGB8/COL11A2/ITGB5
hsa04060 CXCL1/TNFRSF11B/LIFR/XCL1/GDF5/TNFRSF17/TNFRSF11A/IL5RA/HGF/EPOR/CSF2RA/TNFRSF25/BMP7/IL21R/IL23A/TGFB2
         Count
hsa04512     9
hsa04060    16


> 
> barplot(kegg_f[,7],horiz=T,col=rainbow(10),names.arg=kegg_f[,1])
      qvalueの値をグラフ化


KEGGfig

----------複数データでの比較-------------------------------------------------




> summary(gcSample)
   Length Class  Mode     
X1 216    -none- character
X2 805    -none- character
X3 392    -none- character
X4 838    -none- character
X5 929    -none- character
X6 585    -none- character
X7 582    -none- character
X8 237    -none- character


> summary(gcSample[[3]])
   Length     Class      Mode 
      392 character character 

 
> ck <- compareCluster(geneCluster=gcSample, fun="enrichKEGG",use_internal_data=TRUE)


> ck
Result of Comparing 8 gene clusters 

> summary(ck)
   Cluster       ID                           Description GeneRatio  BgRatio       pvalue
1       X2 hsa04110                            Cell cycle    18/297 128/5894 6.383147e-05
2       X2 hsa05340              Primary immunodeficiency     8/297  35/5894 2.698906e-04
3       X3 hsa04512              ECM-receptor interaction     9/152  85/5894 3.077198e-04
4       X4 hsa03030                       DNA replication    10/326  36/5894 1.632995e-05
5       X4 hsa04110                            Cell cycle    20/326 128/5894 1.997779e-05
6       X4 hsa00240                 Pyrimidine metabolism    16/326  99/5894 8.987559e-05
7       X4 hsa05215                       Prostate cancer    15/326  89/5894 9.112870e-05
8       X4 hsa05200                    Pathways in cancer    33/326 327/5894 4.901026e-04
9       X4 hsa04910             Insulin signaling pathway    18/326 138/5894 5.390480e-04
10      X4 hsa04510                        Focal adhesion    23/326 200/5894 6.083254e-04
11      X4 hsa04710             Circadian rhythm - mammal     6/326  23/5894 1.237112e-03
12      X4 hsa00230                     Purine metabolism    19/326 162/5894 1.421089e-03
13      X4 hsa05211                  Renal cell carcinoma    11/326  70/5894 1.434186e-03
14      X4 hsa05212                     Pancreatic cancer    11/326  70/5894 1.434186e-03
15      X4 hsa03050                            Proteasome     8/326  45/5894 2.868510e-03
16      X6 hsa04110                            Cell cycle    14/201 128/5894 1.030558e-04
17      X6 hsa05332             Graft-versus-host disease     7/201  43/5894 5.480036e-04
18      X7 hsa03050                            Proteasome    11/261  45/5894 2.760927e-06
19      X7 hsa00190             Oxidative phosphorylation    17/261 132/5894 6.268399e-05
20      X7 hsa05010                   Alzheimer's disease    19/261 168/5894 1.400053e-04
21      X7 hsa00970           Aminoacyl-tRNA biosynthesis    10/261  63/5894 3.902113e-04
22      X7 hsa05016                  Huntington's disease    19/261 183/5894 4.267110e-04
23      X7 hsa05012                   Parkinson's disease    15/261 130/5894 5.710516e-04
24      X8 hsa04260            Cardiac muscle contraction    12/136  77/5894 1.430147e-07
25      X8 hsa05012                   Parkinson's disease    15/136 130/5894 2.277792e-07
26      X8 hsa00190             Oxidative phosphorylation    15/136 132/5894 2.790307e-07
27      X8 hsa05016                  Huntington's disease    17/136 183/5894 8.036560e-07
28      X8 hsa05010                   Alzheimer's disease    16/136 168/5894 1.245760e-06
29      X8 hsa03050                            Proteasome     7/136  45/5894 6.511827e-05
30      X8 hsa05130 Pathogenic Escherichia coli infection     7/136  58/5894 3.361636e-04
31      X8 hsa04974      Protein digestion and absorption     8/136  81/5894 5.095342e-04
32      X8 hsa00010          Glycolysis / Gluconeogenesis     7/136  65/5894 6.806984e-04
33      X8 hsa04512              ECM-receptor interaction     7/136  85/5894 3.294487e-03
       p.adjust       qvalue
1  1.206415e-02 1.142247e-02
2  2.550466e-02 2.414810e-02
3  4.738886e-02 4.275686e-02
4  1.877912e-03 1.587708e-03
5  1.877912e-03 1.587708e-03
6  4.283049e-03 3.621167e-03
7  4.283049e-03 3.621167e-03
8  1.633788e-02 1.381310e-02
9  1.633788e-02 1.381310e-02
10 1.633788e-02 1.381310e-02
11 2.451154e-02 2.072365e-02
12 2.451154e-02 2.072365e-02
13 2.451154e-02 2.072365e-02
14 2.451154e-02 2.072365e-02
15 4.493998e-02 3.799517e-02
16 1.751949e-02 1.746525e-02
17 4.658031e-02 4.643610e-02
18 5.080106e-04 4.824357e-04
19 5.766927e-03 5.476601e-03
20 8.586993e-03 8.154696e-03
21 1.570297e-02 1.491243e-02
22 1.570297e-02 1.491243e-02
23 1.751225e-02 1.663063e-02
24 1.088220e-05 1.008427e-05
25 1.088220e-05 1.008427e-05
26 1.088220e-05 1.008427e-05
27 2.350694e-05 2.178331e-05
28 2.915079e-05 2.701333e-05
29 1.269806e-03 1.176699e-03
30 5.618735e-03 5.206745e-03
31 7.451938e-03 6.905529e-03
32 8.849079e-03 8.200226e-03
33 3.854549e-02 3.571917e-02
                                                                                                                                                                  geneID
1                                                                                991/1869/890/1871/701/990/10926/9088/8317/9700/9134/1029/2810/699/11200/23594/8555/4173
2                                                                                                                                    100/6891/3932/973/916/925/958/64421
3                                                                                                                           7057/3339/3695/1101/3679/3910/3696/1302/3693
4                                                                                                                    5425/4172/4175/4171/10535/5984/2237/4176/54107/4173
5                                                                       6500/9184/4172/994/4175/4171/1387/10274/8697/902/4616/5591/4176/8881/7043/983/1022/1028/891/4173
6                                                                                      5425/7296/4860/6241/7298/5440/7372/5430/9583/4832/54107/953/5435/1635/51728/29922
7                                                                                            2950/1387/5159/5604/5156/596/3551/367/2260/5595/5295/10000/6655/90993/80310
8  2950/6513/3912/5899/1387/5159/613/5604/10401/5156/596/8324/4318/7188/8321/2737/2353/3551/7043/367/2260/5595/7422/5295/10000/6655/1287/5898/5228/3913/8322/112398/6772
9                                                                             5573/5571/5590/6720/5604/5106/5577/5507/8660/3551/9470/5595/5295/10000/6655/5576/3643/1978
10                                                   7057/3912/5159/5604/857/5156/596/7060/1311/2534/5595/7422/5295/1289/10000/6655/1287/3693/5228/3913/1277/10451/80310
11                                                                                                                                       6500/8553/8864/1453/23291/79365
12                                                                      10606/5425/4860/6241/5440/5430/353/9583/4832/5143/54107/953/5435/158/2618/8833/51292/51728/29922
13                                                                                                             6513/1387/5604/7043/5595/7422/5295/10000/6655/5228/112398
14                                                                                                               5899/5604/3551/7043/5595/7422/5295/10000/5898/5228/6772
15                                                                                                                              5701/5704/5713/5718/5699/5698/23198/5688
16                                                                                                    4174/5347/990/7272/4999/1869/995/1111/4088/1031/9134/4173/1028/898
17                                                                                                                                       3112/942/940/3824/3002/3458/356
18                                                                                                               5690/5714/5708/5686/5683/5684/7979/5685/5717/5693/10213
19                                                                                 533/1537/4698/9296/4700/4729/4726/4723/4704/4695/4724/1329/1350/4705/4708/54539/27068
20                                                                        1537/4698/4700/8883/3028/5532/2776/4729/348/4726/4723/4704/4695/4724/1329/1350/4705/4708/54539
21                                                                                                                        3735/7453/2058/16/6897/4141/3376/2617/8565/833
22                                                                        163/1537/4698/3066/4700/2776/5440/4729/4726/1175/4723/4704/4695/4724/1329/1350/4705/4708/54539
23                                                                                           7332/1537/4698/4700/4729/4726/4723/4704/4695/4724/1329/1350/4705/4708/54539
24                                                                                                           1327/1351/1350/481/27089/1349/7388/9377/7386/488/7168/29796
25                                                                                              1327/292/1351/1350/506/27089/1349/7388/9377/509/518/517/54205/7386/29796
26                                                                                              1327/1351/1350/506/27089/1349/7388/9551/9377/509/518/517/7386/5464/29796
27                                                                                    1327/292/1211/1351/1350/506/27089/1349/7388/9377/509/518/517/54205/7386/1175/29796
28                                                                                         1327/1351/1350/506/27089/1349/7388/9377/509/518/517/54205/7386/488/2597/29796
29                                                                                                                                    5695/5701/5688/5691/5721/5692/5687
30                                                                                                                              7534/10971/10376/3875/10383/203068/84790
31                                                                                                                                481/1293/1281/1277/1278/1292/1291/1290
32                                                                                                                                    5230/7167/5223/3945/2023/2821/2597
33                                                                                                                                    1293/1281/1277/1278/1292/1291/1290
   Count
1     18
2      8
3      9
4     10
5     20
6     16
7     15
8     33
9     18
10    23
11     6
12    19
13    11
14    11
15     8
16    14
17     7
18    11
19    17
20    19
21    10
22    19
23    15
24    12
25    15
26    15
27    17
28    16
29     7
30     7
31     8
32     7
33     7
> 
> plot(ck)


CompareCluster