---[p.260]------------------------------------------------------
Analyzing ChIP-Seq data
ChIP-Seq=クロマチン免疫沈降シーケンシング
クロマチン免疫沈降シーケンス
転写調節因子や、その他のタンパク質DNA相互作用部位をpeak領域として特定できる。
<参考>
http://jp.illumina.com/content/dam/illumina-marketing/apac/japan/documents/pdf/datasheet_chip_sequence-j.pdf
> biocLite("chipseq")
BioC_mirror: http://bioconductor.org
Using Bioconductor version 3.1 (BiocInstaller 1.18.5), R version 3.2.5.
Installing package(s) ‘chipseq’
trying URL 'http://bioconductor.org/packages/3.1/bioc/bin/windows/contrib/3.2/chipseq_1.18.0.zip
<参考>
https://www.bioconductor.org/packages/devel/bioc/manuals/chipseq/man/chipseq.pdf
> library(chipseq)
UCSCによるマウスmm9 versionの遺伝子アノテーションデータ
<参考>
https://bioconductor.org/packages/release/data/annotation/manuals/TxDb.Mmusculus.UCSC.mm9.knownGene/man/TxDb.Mmusculus.UCSC.mm9.knownGene.pdf
> biocLite("TxDb.Mmusculus.UCSC.mm9.knownGene")
BioC_mirror: http://bioconductor.org
Using Bioconductor version 3.1 (BiocInstaller 1.18.5), R version 3.2.5.
Installing package(s) ‘TxDb.Mmusculus.UCSC.mm9.knownGene’
installing the source package ‘TxDb.Mmusculus.UCSC.mm9.knownGene’
trying URL 'http://bioconductor.org/packages/3.1/data/annotation/src/contrib/TxDb.Mmusculus.UCSC.mm9.knownGene_3.1.2.tar.gz'
Content type 'application/x-gzip' length 1352
> library("TxDb.Mmusculus.UCSC.mm9.knownGene")
要求されたパッケージ GenomicFeatures をロード中です
要求されたパッケージ GenomeInfoDb をロード中です
要求されたパッケージ GenomicRanges をロード中です
CTCF タンパク質(CCCTC-binding factor).
<参考>
*http://www.ncbi.nlm.nih.gov/gene/10664
*Human CTCFは、ChIP-seq解析ツールの事例データとしてよく使われる。
> load("cstest.rda") ctcfデータ
>
> cstest
GRangesList object of length 2:
$ctcf
GRanges object with 450096 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr10 [3012936, 3012959] +
[2] chr10 [3012941, 3012964] +
[3] chr10 [3012944, 3012967] +
[4] chr10 [3012955, 3012978] +
[5] chr10 [3012963, 3012986] +
... ... ... ...
[450092] chr12 [121239376, 121239399] -
[450093] chr12 [121245849, 121245872] -
[450094] chr12 [121245895, 121245918] -
[450095] chr12 [121246344, 121246367] -
[450096] chr12 [121253499, 121253522] -
...
<1 more element>
-------
seqinfo: 35 sequences from an unspecified genome
>
>
> summary(cstest)
Length Class Mode
2 GRangesList S4
> estimate.mean.fraglen(cstest$ctcf)
chr10 chr11 chr12
179.6794 172.4884 181.6732
> ctcf_ext<-resize(cstest$ctcf,width=200) 重複領域を200bp単位に変更
> ctcf_ext
GRanges object with 450096 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr10 [3012936, 3013135] +
[2] chr10 [3012941, 3013140] +
[3] chr10 [3012944, 3013143] +
[4] chr10 [3012955, 3013154] +
[5] chr10 [3012963, 3013162] +
... ... ... ...
[450092] chr12 [121239200, 121239399] -
[450093] chr12 [121245673, 121245872] -
[450094] chr12 [121245719, 121245918] -
[450095] chr12 [121246168, 121246367] -
[450096] chr12 [121253323, 121253522] -
-------
seqinfo: 35 sequences from an unspecified genome
> gfp_ext<-resize(cstest$gfp,width=200)
> gfp_ext
GRanges object with 295385 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr10 [3002512, 3002711] +
[2] chr10 [3009093, 3009292] +
[3] chr10 [3020716, 3020915] +
[4] chr10 [3023026, 3023225] +
[5] chr10 [3024629, 3024828] +
... ... ... ...
[295381] chr12 [121212950, 121213149] -
[295382] chr12 [121216729, 121216928] -
[295383] chr12 [121216791, 121216990] -
[295384] chr12 [121251629, 121251828] -
[295385] chr12 [121253250, 121253449] -
-------
seqinfo: 35 sequences from an unspecified genome
> cov_ctcf<-coverage(ctcf_ext) genome coverageの計算
> cov_ctcf
RleList of length 35
$chr1
integer-Rle of length 197195432 with 1 run
Lengths: 197195432
Values : 0
$chr2
integer-Rle of length 181748087 with 1 run
Lengths: 181748087
Values : 0
$chr3
integer-Rle of length 159599783 with 1 run
Lengths: 159599783
Values : 0
$chr4
integer-Rle of length 155630120 with 1 run
Lengths: 155630120
Values : 0
$chr5
integer-Rle of length 152537259 with 1 run
Lengths: 152537259
Values : 0
...
<30 more elements>
>
> cov_gfp<-coverage(gfp_ext)
> cov_gfp
RleList of length 35
$chr1
integer-Rle of length 197195432 with 1 run
Lengths: 197195432
Values : 0
$chr2
integer-Rle of length 181748087 with 1 run
Lengths: 181748087
Values : 0
$chr3
integer-Rle of length 159599783 with 1 run
Lengths: 159599783
Values : 0
$chr4
integer-Rle of length 155630120 with 1 run
Lengths: 155630120
Values : 0
$chr5
integer-Rle of length 152537259 with 1 run
Lengths: 152537259
Values : 0
...
<30 more elements>
>
depth分布のプロット
> par(mfrow=c(2,1))
> islandDepthPlot(cov_ctcf)
>
> islandDepthPlot(cov_gfp)
>
有意なpeak探索
> peakCutoff(cov_ctcf,fdr=0.01) FDR<0.01でカット
[1] 5.091909
>
> peakCutoff(cov_gfp,fdr=0.01)
[1] 6.979273
> peaks_ctcf<-slice(cov_ctcf,lower=7) depth7以下,FDR<0.01のデータ抽出
> peaks_ctcf
RleViewsList of length 35
names(35): chr1 chr2 chr3 chr4 ... chrX_random chrY_random chrUn_random
> peaks_gfp<-slice(cov_gfp,lower=7) depth7以下,FDR<0.01のデータ抽出
> peaks_gfp
RleViewsList of length 35
names(35): chr1 chr2 chr3 chr4 ... chrX_random chrY_random chrUn_random
>
> pksummary<-diffPeakSummary(peaks_gfp,peaks_ctcf)
> pksummary
RangedData with 6414 rows and 5 value columns across 35 spaces
space ranges | comb.max sums1 sums2
<factor> <IRanges> | <integer> <integer> <integer>
1 chr10 [3012944, 3013140] | 11 0 1911
2 chr10 [3135027, 3135029] | 7 0 21
3 chr10 [3234798, 3234896] | 10 0 910
4 chr10 [3234924, 3234933] | 7 0 70
5 chr10 [3270010, 3270301] | 20 164 4072
6 chr10 [3277660, 3277861] | 13 0 1897
7 chr10 [3460841, 3460978] | 11 138 1194
8 chr10 [3617844, 3617997] | 10 0 1326
9 chr10 [3651705, 3651995] | 24 0 4800
... ... ... ... ... ... ...
6406 chr12 [120153549, 120153734] | 10 0 1566
6407 chr12 [120165145, 120165390] | 12 0 2420
6408 chr12 [120165393, 120165422] | 8 0 218
6409 chr12 [120165424, 120165432] | 7 0 63
6410 chr12 [120169031, 120169253] | 14 64 2222
6411 chr12 [120665288, 120665512] | 14 0 2484
6412 chr12 [120708861, 120708882] | 7 0 154
6413 chr12 [120708953, 120708961] | 7 0 63
6414 chr12 [121220247, 121220461] | 12 0 2028
maxs1 maxs2
<integer> <integer>
1 0 11
2 0 7
3 0 10
4 0 7
5 1 19
6 0 13
7 1 10
8 0 10
9 0 24
... ... ...
6406 0 10
6407 0 12
6408 0 8
6409 0 7
6410 1 14
6411 0 14
6412 0 7
6413 0 7
GFP(X軸),CTCF(Y軸)のピークデータ視覚化
> library(lattice)
> xyplot(asinh(pksummary$sums2)~asinh(pksummary$sums1) | pksummary$space)
>
> greg<-transcripts(TxDb.Mmusculus.UCSC.mm9.knownGene) マウス遺伝子領域
> greg
GRanges object with 55419 ranges and 2 metadata columns:
seqnames ranges strand | tx_id
<Rle> <IRanges> <Rle> | <integer>
[1] chr1 [4797974, 4832908] + | 1
[2] chr1 [4797974, 4836816] + | 2
[3] chr1 [4847775, 4887990] + | 3
[4] chr1 [4847775, 4887990] + | 4
[5] chr1 [4848409, 4887990] + | 5
... ... ... ... ... ...
[55415] chrUn_random [2204169, 2216886] - | 55415
[55416] chrUn_random [2674945, 2678407] - | 55416
[55417] chrUn_random [2889607, 2891056] - | 55417
[55418] chrUn_random [3830796, 3837247] - | 55418
[55419] chrUn_random [4677114, 4677187] - | 55419
tx_name
<character>
[1] uc007afg.1
[2] uc007afh.1
[3] uc007afi.2
[4] uc011wht.1
[5] uc011whu.1
... ...
[55415] uc009sjw.2
[55416] uc009skb.2
[55417] uc009ske.1
[55418] uc009skg.1
[55419] uc009skh.1
-------
seqinfo: 35 sequences (1 circular) from mm9 genome
> prom<-flank(greg,1000,both=TRUE) 上流1000bpをpromotorとして抽出
> prom
GRanges object with 55419 ranges and 2 metadata columns:
seqnames ranges strand | tx_id
<Rle> <IRanges> <Rle> | <integer>
[1] chr1 [4796974, 4798973] + | 1
[2] chr1 [4796974, 4798973] + | 2
[3] chr1 [4846775, 4848774] + | 3
[4] chr1 [4846775, 4848774] + | 4
[5] chr1 [4847409, 4849408] + | 5
... ... ... ... ... ...
[55415] chrUn_random [2215887, 2217886] - | 55415
[55416] chrUn_random [2677408, 2679407] - | 55416
[55417] chrUn_random [2890057, 2892056] - | 55417
[55418] chrUn_random [3836248, 3838247] - | 55418
[55419] chrUn_random [4676188, 4678187] - | 55419
tx_name
<character>
[1] uc007afg.1
[2] uc007afh.1
[3] uc007afi.2
[4] uc011wht.1
[5] uc011whu.1
... ...
[55415] uc009sjw.2
[55416] uc009skb.2
[55417] uc009ske.1
[55418] uc009skg.1
[55419] uc009skh.1
-------
seqinfo: 35 sequences (1 circular) from mm9 genome
>
マウス遺伝子注釈由来promotorと、ChIP-seq実験データ由来promotorで重なる領域抽出
> pksummary$inPromotor<-pksummary %over% prom
> which(pksummary$inPromotor)
[1] 2 14 41 51 100 101 110 124 126 130 133 141 155
[14] 156 157 158 175 176 177 180 184 185 211 212 213 217
[27] 218 257 273 276 299 306 307 315 318 319 325 327 328
[40] 358 361 362 370 371 375 381 386 432 456 460 461 493
[53] 499 502 513 527 555 556 566 595 596 604 606 607 621
[66] 622 623 628 652 657 663 665 676 696 697 701 710 715
[79] 725 824 825 877 900 914 915 918 919 925 938 942 963
[92] 976 977 993 996 998 1008 1027 1033 1034 1036 1037 1040 1051
[105] 1052 1055 1056 1057 1059 1067 1068 1071 1072 1076 1079 1080 1087
[118] 1091 1108 1109 1110 1112 1114 1115 1124 1127 1131 1144 1159 1166
[131] 1175 1182 1195 1211 1226 1263 1288 1289 1290 1292 1308 1339 1346
[144] 1370 1386 1412 1413 1414 1425 1436 1463 1491 1495 1496 1497 1498
[157] 1549 1556 1557 1573 1580 1590 1600 1608 1609 1620 1647 1651 1685
[170] 1686 1705 1715 1716 1721 1740 1741 1776 1783 1795 1796 1797 1802
[183] 1803 1804 1831 1834 1836 1838 1845 1847 1848 1850 1855 1857 1865
[196] 1866 1869 1870 1875 1878 1881 1882 1883 1937 1938 1939 1940 1941
[209] 1942 1943 1944 1966 1967 1968 1969 2034 2035 2036 2037 2038 2039
[222] 2040 2051 2053 2058 2063 2066 2081 2093 2094 2095 2098 2102 2127
[235] 2134 2137 2150 2153 2154 2159 2164 2172 2182 2183 2202 2218 2235
[248] 2236 2237 2238 2240 2248 2268 2300 2302 2330 2334 2336 2338 2339
[261] 2365 2367 2372 2375 2378 2379 2380 2412 2413 2419 2452 2503 2560
[274] 2586 2591 2592 2637 2656 2659 2666 2667 2675 2697 2703 2711 2712
:
[664] 6131 6148 6161 6182 6224 6259 6267 6268 6309 6313 6318 6322 6327
[677] 6328 6349 6361 6362