Skip to content

Latest commit

 

History

History
382 lines (314 loc) · 12.9 KB

160801.md

File metadata and controls

382 lines (314 loc) · 12.9 KB
---[p.260]------------------------------------------------------
Analyzing ChIP-Seq data

ChIP-Seq=クロマチン免疫沈降シーケンシング
クロマチン免疫沈降シーケンス
転写調節因子や、その他のタンパク質DNA相互作用部位をpeak領域として特定できる。

<参考>
http://jp.illumina.com/content/dam/illumina-marketing/apac/japan/documents/pdf/datasheet_chip_sequence-j.pdf


> biocLite("chipseq") 
BioC_mirror: http://bioconductor.org
Using Bioconductor version 3.1 (BiocInstaller 1.18.5), R version 3.2.5.
Installing package(s) ‘chipseq’
trying URL 'http://bioconductor.org/packages/3.1/bioc/bin/windows/contrib/3.2/chipseq_1.18.0.zip

<参考>
https://www.bioconductor.org/packages/devel/bioc/manuals/chipseq/man/chipseq.pdf

> library(chipseq)

UCSCによるマウスmm9 versionの遺伝子アノテーションデータ
<参考>
https://bioconductor.org/packages/release/data/annotation/manuals/TxDb.Mmusculus.UCSC.mm9.knownGene/man/TxDb.Mmusculus.UCSC.mm9.knownGene.pdf

> biocLite("TxDb.Mmusculus.UCSC.mm9.knownGene")
BioC_mirror: http://bioconductor.org
Using Bioconductor version 3.1 (BiocInstaller 1.18.5), R version 3.2.5.
Installing package(s) ‘TxDb.Mmusculus.UCSC.mm9.knownGene’
installing the source package ‘TxDb.Mmusculus.UCSC.mm9.knownGene’

trying URL 'http://bioconductor.org/packages/3.1/data/annotation/src/contrib/TxDb.Mmusculus.UCSC.mm9.knownGene_3.1.2.tar.gz'
Content type 'application/x-gzip' length 1352

> library("TxDb.Mmusculus.UCSC.mm9.knownGene")
 要求されたパッケージ GenomicFeatures をロード中です 
 要求されたパッケージ GenomeInfoDb をロード中です 
 要求されたパッケージ GenomicRanges をロード中です 

CTCF タンパク質(CCCTC-binding factor).
<参考>
*http://www.ncbi.nlm.nih.gov/gene/10664
*Human CTCFは、ChIP-seq解析ツールの事例データとしてよく使われる。

> load("cstest.rda")  ctcfデータ
> 

> cstest
GRangesList object of length 2:
$ctcf 
GRanges object with 450096 ranges and 0 metadata columns:
           seqnames                 ranges strand
              <Rle>              <IRanges>  <Rle>
       [1]    chr10     [3012936, 3012959]      +
       [2]    chr10     [3012941, 3012964]      +
       [3]    chr10     [3012944, 3012967]      +
       [4]    chr10     [3012955, 3012978]      +
       [5]    chr10     [3012963, 3012986]      +
       ...      ...                    ...    ...
  [450092]    chr12 [121239376, 121239399]      -
  [450093]    chr12 [121245849, 121245872]      -
  [450094]    chr12 [121245895, 121245918]      -
  [450095]    chr12 [121246344, 121246367]      -
  [450096]    chr12 [121253499, 121253522]      -

...
<1 more element>
-------
seqinfo: 35 sequences from an unspecified genome

> 


> 
> summary(cstest)
     Length       Class        Mode 
          2 GRangesList          S4 


> estimate.mean.fraglen(cstest$ctcf)
   chr10    chr11    chr12 
179.6794 172.4884 181.6732 


> ctcf_ext<-resize(cstest$ctcf,width=200) 重複領域を200bp単位に変更
> ctcf_ext
GRanges object with 450096 ranges and 0 metadata columns:
           seqnames                 ranges strand
              <Rle>              <IRanges>  <Rle>
       [1]    chr10     [3012936, 3013135]      +
       [2]    chr10     [3012941, 3013140]      +
       [3]    chr10     [3012944, 3013143]      +
       [4]    chr10     [3012955, 3013154]      +
       [5]    chr10     [3012963, 3013162]      +
       ...      ...                    ...    ...
  [450092]    chr12 [121239200, 121239399]      -
  [450093]    chr12 [121245673, 121245872]      -
  [450094]    chr12 [121245719, 121245918]      -
  [450095]    chr12 [121246168, 121246367]      -
  [450096]    chr12 [121253323, 121253522]      -
  -------
  seqinfo: 35 sequences from an unspecified genome



> gfp_ext<-resize(cstest$gfp,width=200)
> gfp_ext
GRanges object with 295385 ranges and 0 metadata columns:
           seqnames                 ranges strand
              <Rle>              <IRanges>  <Rle>
       [1]    chr10     [3002512, 3002711]      +
       [2]    chr10     [3009093, 3009292]      +
       [3]    chr10     [3020716, 3020915]      +
       [4]    chr10     [3023026, 3023225]      +
       [5]    chr10     [3024629, 3024828]      +
       ...      ...                    ...    ...
  [295381]    chr12 [121212950, 121213149]      -
  [295382]    chr12 [121216729, 121216928]      -
  [295383]    chr12 [121216791, 121216990]      -
  [295384]    chr12 [121251629, 121251828]      -
  [295385]    chr12 [121253250, 121253449]      -
  -------
  seqinfo: 35 sequences from an unspecified genome


> cov_ctcf<-coverage(ctcf_ext) genome coverageの計算
> cov_ctcf
RleList of length 35
$chr1
integer-Rle of length 197195432 with 1 run
  Lengths: 197195432
  Values :         0

$chr2
integer-Rle of length 181748087 with 1 run
  Lengths: 181748087
  Values :         0

$chr3
integer-Rle of length 159599783 with 1 run
  Lengths: 159599783
  Values :         0

$chr4
integer-Rle of length 155630120 with 1 run
  Lengths: 155630120
  Values :         0

$chr5
integer-Rle of length 152537259 with 1 run
  Lengths: 152537259
  Values :         0

...
<30 more elements>
> 


> cov_gfp<-coverage(gfp_ext)
> cov_gfp
RleList of length 35
$chr1
integer-Rle of length 197195432 with 1 run
  Lengths: 197195432
  Values :         0

$chr2
integer-Rle of length 181748087 with 1 run
  Lengths: 181748087
  Values :         0

$chr3
integer-Rle of length 159599783 with 1 run
  Lengths: 159599783
  Values :         0

$chr4
integer-Rle of length 155630120 with 1 run
  Lengths: 155630120
  Values :         0

$chr5
integer-Rle of length 152537259 with 1 run
  Lengths: 152537259
  Values :         0

...
<30 more elements>
> 

depth分布のプロット

> par(mfrow=c(2,1))
> islandDepthPlot(cov_ctcf)


> 
> islandDepthPlot(cov_gfp)
> 



有意なpeak探索

> peakCutoff(cov_ctcf,fdr=0.01) FDR<0.01でカット
[1] 5.091909
> 
> peakCutoff(cov_gfp,fdr=0.01) 
[1] 6.979273


> peaks_ctcf<-slice(cov_ctcf,lower=7) depth7以下,FDR<0.01のデータ抽出
> peaks_ctcf
RleViewsList of length 35
names(35): chr1 chr2 chr3 chr4 ... chrX_random chrY_random chrUn_random


> peaks_gfp<-slice(cov_gfp,lower=7) depth7以下,FDR<0.01のデータ抽出

> peaks_gfp
RleViewsList of length 35
names(35): chr1 chr2 chr3 chr4 ... chrX_random chrY_random chrUn_random
> 


> pksummary<-diffPeakSummary(peaks_gfp,peaks_ctcf)
> pksummary
RangedData with 6414 rows and 5 value columns across 35 spaces
        space                 ranges   |  comb.max     sums1     sums2
     <factor>              <IRanges>   | <integer> <integer> <integer>
1       chr10     [3012944, 3013140]   |        11         0      1911
2       chr10     [3135027, 3135029]   |         7         0        21
3       chr10     [3234798, 3234896]   |        10         0       910
4       chr10     [3234924, 3234933]   |         7         0        70
5       chr10     [3270010, 3270301]   |        20       164      4072
6       chr10     [3277660, 3277861]   |        13         0      1897
7       chr10     [3460841, 3460978]   |        11       138      1194
8       chr10     [3617844, 3617997]   |        10         0      1326
9       chr10     [3651705, 3651995]   |        24         0      4800
...       ...                    ... ...       ...       ...       ...
6406    chr12 [120153549, 120153734]   |        10         0      1566
6407    chr12 [120165145, 120165390]   |        12         0      2420
6408    chr12 [120165393, 120165422]   |         8         0       218
6409    chr12 [120165424, 120165432]   |         7         0        63
6410    chr12 [120169031, 120169253]   |        14        64      2222
6411    chr12 [120665288, 120665512]   |        14         0      2484
6412    chr12 [120708861, 120708882]   |         7         0       154
6413    chr12 [120708953, 120708961]   |         7         0        63
6414    chr12 [121220247, 121220461]   |        12         0      2028
         maxs1     maxs2
     <integer> <integer>
1            0        11
2            0         7
3            0        10
4            0         7
5            1        19
6            0        13
7            1        10
8            0        10
9            0        24
...        ...       ...
6406         0        10
6407         0        12
6408         0         8
6409         0         7
6410         1        14
6411         0        14
6412         0         7
6413         0         7


GFP(X軸),CTCF(Y軸)のピークデータ視覚化

> library(lattice)

> xyplot(asinh(pksummary$sums2)~asinh(pksummary$sums1) | pksummary$space)
> 





> greg<-transcripts(TxDb.Mmusculus.UCSC.mm9.knownGene) マウス遺伝子領域
> greg 
GRanges object with 55419 ranges and 2 metadata columns:
              seqnames             ranges strand   |     tx_id
                 <Rle>          <IRanges>  <Rle>   | <integer>
      [1]         chr1 [4797974, 4832908]      +   |         1
      [2]         chr1 [4797974, 4836816]      +   |         2
      [3]         chr1 [4847775, 4887990]      +   |         3
      [4]         chr1 [4847775, 4887990]      +   |         4
      [5]         chr1 [4848409, 4887990]      +   |         5
      ...          ...                ...    ... ...       ...
  [55415] chrUn_random [2204169, 2216886]      -   |     55415
  [55416] chrUn_random [2674945, 2678407]      -   |     55416
  [55417] chrUn_random [2889607, 2891056]      -   |     55417
  [55418] chrUn_random [3830796, 3837247]      -   |     55418
  [55419] chrUn_random [4677114, 4677187]      -   |     55419
              tx_name
          <character>
      [1]  uc007afg.1
      [2]  uc007afh.1
      [3]  uc007afi.2
      [4]  uc011wht.1
      [5]  uc011whu.1
      ...         ...
  [55415]  uc009sjw.2
  [55416]  uc009skb.2
  [55417]  uc009ske.1
  [55418]  uc009skg.1
  [55419]  uc009skh.1
  -------
  seqinfo: 35 sequences (1 circular) from mm9 genome


> prom<-flank(greg,1000,both=TRUE) 上流1000bpをpromotorとして抽出
> prom
GRanges object with 55419 ranges and 2 metadata columns:
              seqnames             ranges strand   |     tx_id
                 <Rle>          <IRanges>  <Rle>   | <integer>
      [1]         chr1 [4796974, 4798973]      +   |         1
      [2]         chr1 [4796974, 4798973]      +   |         2
      [3]         chr1 [4846775, 4848774]      +   |         3
      [4]         chr1 [4846775, 4848774]      +   |         4
      [5]         chr1 [4847409, 4849408]      +   |         5
      ...          ...                ...    ... ...       ...
  [55415] chrUn_random [2215887, 2217886]      -   |     55415
  [55416] chrUn_random [2677408, 2679407]      -   |     55416
  [55417] chrUn_random [2890057, 2892056]      -   |     55417
  [55418] chrUn_random [3836248, 3838247]      -   |     55418
  [55419] chrUn_random [4676188, 4678187]      -   |     55419
              tx_name
          <character>
      [1]  uc007afg.1
      [2]  uc007afh.1
      [3]  uc007afi.2
      [4]  uc011wht.1
      [5]  uc011whu.1
      ...         ...
  [55415]  uc009sjw.2
  [55416]  uc009skb.2
  [55417]  uc009ske.1
  [55418]  uc009skg.1
  [55419]  uc009skh.1
  -------
  seqinfo: 35 sequences (1 circular) from mm9 genome

> 

マウス遺伝子注釈由来promotorと、ChIP-seq実験データ由来promotorで重なる領域抽出

> pksummary$inPromotor<-pksummary %over% prom 
> which(pksummary$inPromotor)
  [1]    2   14   41   51  100  101  110  124  126  130  133  141  155
 [14]  156  157  158  175  176  177  180  184  185  211  212  213  217
 [27]  218  257  273  276  299  306  307  315  318  319  325  327  328
 [40]  358  361  362  370  371  375  381  386  432  456  460  461  493
 [53]  499  502  513  527  555  556  566  595  596  604  606  607  621
 [66]  622  623  628  652  657  663  665  676  696  697  701  710  715
 [79]  725  824  825  877  900  914  915  918  919  925  938  942  963
 [92]  976  977  993  996  998 1008 1027 1033 1034 1036 1037 1040 1051
[105] 1052 1055 1056 1057 1059 1067 1068 1071 1072 1076 1079 1080 1087
[118] 1091 1108 1109 1110 1112 1114 1115 1124 1127 1131 1144 1159 1166
[131] 1175 1182 1195 1211 1226 1263 1288 1289 1290 1292 1308 1339 1346
[144] 1370 1386 1412 1413 1414 1425 1436 1463 1491 1495 1496 1497 1498
[157] 1549 1556 1557 1573 1580 1590 1600 1608 1609 1620 1647 1651 1685
[170] 1686 1705 1715 1716 1721 1740 1741 1776 1783 1795 1796 1797 1802
[183] 1803 1804 1831 1834 1836 1838 1845 1847 1848 1850 1855 1857 1865
[196] 1866 1869 1870 1875 1878 1881 1882 1883 1937 1938 1939 1940 1941
[209] 1942 1943 1944 1966 1967 1968 1969 2034 2035 2036 2037 2038 2039
[222] 2040 2051 2053 2058 2063 2066 2081 2093 2094 2095 2098 2102 2127
[235] 2134 2137 2150 2153 2154 2159 2164 2172 2182 2183 2202 2218 2235
[248] 2236 2237 2238 2240 2248 2268 2300 2302 2330 2334 2336 2338 2339
[261] 2365 2367 2372 2375 2378 2379 2380 2412 2413 2419 2452 2503 2560
[274] 2586 2591 2592 2637 2656 2659 2666 2667 2675 2697 2703 2711 2712
:
[664] 6131 6148 6161 6182 6224 6259 6267 6268 6309 6313 6318 6322 6327
[677] 6328 6349 6361 6362