-------[p.182]-----------------------------
Association tests with CNV data
Copy Number Variation(CNV) : コピー数多型、集団中でコピー数が個人間で異なるゲノム領域を指す。
*通常:2コピー(1個の細胞で父方由来母方由来の2コピー存在す)
*少ない:1コピー(疾患関連に繋がる可能性)
*多い:3コピー以上
参考情報(油谷等JST報告書より)
*コピー数多型の概念図:http://www.jst.go.jp/pr/info/info361/zu1.html
*コピー数多型地図:http://www.jst.go.jp/pr/info/info361/zu2.html
参考CNVアレイ例
*http://www.tmd.ac.jp/mri/press/press31/index.html
*http://www.tokushima-u.ac.jp/_files/00222716/CytoScanWorkflow.pdf
参考CNV-NGS例 (Nord et al., 2011)
*http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3088570/figure/F1/
*http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3088570/figure/F3/
マニュアル
*https://cran.r-project.org/web/packages/CNVassocData/CNVassocData.pdf
*http://www.creal.cat/media/upload/arxius/jr/script_MLPA.R
> install.packages("CNVassoc")
Installing package into ‘C:/Users/ekaminuma/Documents/R/win-library/3.2’
(as ‘lib’ is unspecified)
also installing the dependencies ‘CNVassocData’, ‘mixdist’, ‘mclust’
downloaded 466 KB
package ‘CNVassocData’ successfully unpacked and MD5 sums checked
package ‘mixdist’ successfully unpacked and MD5 sums checked
package ‘mclust’ successfully unpacked and MD5 sums checked
package ‘CNVassoc’ successfully unpacked and MD5 sums checked
> library(CNVassoc)
要求されたパッケージ CNVassocData をロード中です
要求されたパッケージ mixdist をロード中です
要求されたパッケージ mclust をロード中です
__ ___________ __ _____________
/ |/ / ____/ / / / / / ___/_ __/
/ /|_/ / / / / / / / /\__ \ / /
/ / / / /___/ /___/ /_/ /___/ // /
/_/ /_/\____/_____/\____//____//_/ version 5.2
Type 'citation("mclust")' for citing this R package in publications.
要求されたパッケージ survival をロード中です
>
> data(dataMLPA)
<参考>
http://finzi.psych.upenn.edu/library/CNVassocData/html/dataMLPA.html
id The unique identifiers of individuals
casco Case-control stauts 0:control 1:case
Gene1 Intensities for Gene1
Gene2 Intensities for Gene2
PCR.Gene1 True copy number status for Gene1
PCR.Gene2 True copy number status for Gene2
quanti Simulated continuos variable.
cov Simulated continuous variable.
> str(dataMLPA)
'data.frame': 651 obs. of 8 variables:
$ id : Factor w/ 346 levels "H238","H239",..: 1 1 2 2 3 3 4 4 5 5 ...
$ casco : int 1 1 1 1 1 1 1 1 1 1 ...
$ Gene1 : num 0.51 0.45 0 0 0 0 0.23 0.26 0 0 ...
$ Gene2 : num 0.539 0.639 0.483 0.464 0 ...
$ PCR.Gene1: Factor w/ 3 levels "del","ht","wt": 3 3 1 1 1 1 2 2 1 1 ...
$ PCR.Gene2: Factor w/ 3 levels "del","ht","wt": 3 3 3 3 1 1 1 1 1 1 ...
$ quanti : num -0.61 -0.13 -0.57 -1.4 0.83 -2.07 -1.68 -1.4 1.09 0.55 ...
$ cov : num 10.83 10.69 9.63 9.87 10.25 ...
>
> head(dataMLPA)
id casco Gene1 Gene2 PCR.Gene1 PCR.Gene2 quanti cov
1 H238 1 0.51 0.5385080 wt wt -0.61 10.83
2 H238 1 0.45 0.6392029 wt wt -0.13 10.69
3 H239 1 0.00 0.4831572 del wt -0.57 9.63
4 H239 1 0.00 0.4640072 del wt -1.40 9.87
5 H276 1 0.00 0.0000000 del del 0.83 10.25
6 H276 1 0.00 0.0000000 del del -2.07 10.40
> dataMLPA
:
638 V962 0 0.05 0.42722907 del wt 1.31 9.18
639 V962 0 0.05 0.28655589 del ht -0.34 10.55
640 V965 0 0.00 0.47746664 del wt 1.36 8.52
641 V965 0 0.00 0.61591204 del wt 1.39 10.41
642 V966 0 0.00 0.53093585 del wt -1.45 10.04
643 V966 0 0.00 0.63121841 del wt 1.18 9.07
644 V970 0 0.26 0.46568148 ht wt -1.13 8.49
645 V970 0 0.26 0.36433636 ht ht 0.85 9.51
646 X067 1 0.22 0.00000000 ht del 2.13 10.27
647 X067 1 0.24 0.00000000 ht del -1.71 9.51
648 X105 1 0.00 0.35324276 del ht 1.48 8.68
649 X105 1 0.00 0.51050351 del wt 0.28 10.73
650 X106 1 0.23 0.19154860 ht ht 1.89 8.08
651 X106 1 0.23 0.22994709 ht ht -0.81 10.44
>
> dataMLPA$Gene1
[1] 0.51 0.45 0.00 0.00 0.00 0.00 0.23 0.26 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[16] 0.00 0.22 0.24 0.00 0.00 0.00 0.00 0.00 0.00 0.27 0.28 0.26 0.23 0.00 0.00
[31] 0.00 0.00 0.00 0.00 0.00 0.00 0.22 0.27 0.00 0.00 0.00 0.00 0.00 0.00 0.00
[46] 0.00 0.00 0.29 0.26 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.25 0.26 0.00
> plotSignal(dataMLPA$Gene1,case.control=dataMLPA$casco)
> par(mfrow=c(1,2),mar=c(3,4,3,1))
> hist(dataMLPA$Gene1,main="gene 1 histogram",xlab="",ylab="freq")
> hist(dataMLPA$Gene2,main="gene 2 histogram",xlab="",ylab="freq")
> ?cnv
> myCNV <- cnv(x=dataMLPA$Gene2, threshold.0 = 0.01, mix.method ="mixdist" )
phenotypeとassociation解析
> mod <- CNVassoc(formula=casco~myCNV, data=dataMLPA, model="mul")
>
> mod
Call: CNVassoc(formula = casco ~ myCNV, data = dataMLPA, model = "mul")
Coefficients:
CNV0 CNV1 CNV2
CNVmult 1.0520923 0.3122567 -0.0970782
Number of individuals: 651
Number of estimated parameters: 3
Deviance: 876.396
>
> summary(mod)
Call:
CNVassoc(formula = casco ~ myCNV, data = dataMLPA, model = "mul")
Deviance: 876.396
Number of parameters: 3
Number of individuals: 651
Coefficients:
OR lower.lim upper.lim SE stat pvalue
CNV0 1.0000
CNV1 0.4772 0.2742 0.8304 0.2827 -2.6172 0.009
CNV2 0.3169 0.1834 0.5477 0.2791 -4.1169 0.000
(Dispersion parameter for binomial family taken to be 1 )
Covariance between coefficients:
CNV0 CNV1 CNV2
CNV0 0.0613 0.0000 0.0000
CNV1 0.0186 -0.0032
CNV2 0.0166
> mod2 <- CNVassoc(formula=casco~myCNV, data=dataMLPA, model="add")
>
>
> anova(mod,mod2)
--- Likelihood ratio test comparing 2 CNVassoc models:
Model 1 call: CNVassoc(formula = casco ~ myCNV, data = dataMLPA, model = "mul")
Model 2 call: CNVassoc(formula = casco ~ myCNV, data = dataMLPA, model = "add")
Chi= 0.6645798 (df= 1 ) p-value= 0.4149477
Note: the 2 models must be nested, and this function doesn't check this!
>
> logLik(mod)
logLik df
-438.198 3.000
CNV associationが有意か検定
LRT: 尤度比検定
Wald: Wald検定
> CNVtest(mod,type="LRT")
----CNV Likelihood Ratio Test----
Chi= 18.75453 (df= 2 ) , pvalue= 8.462633e-05
> CNVtest(mod,type=c("Wald","LRT"))
----CNV Wald test----
Chi= 17.32966 (df= 2 ) , pvalue= 0.0001725492
Warning messages:
1: In if (type.test == 1) { :
the condition has length > 1 and only the first element will be used
2: In if (x$type == 1) cat("----CNV Wald test----\n") else cat("----CNV Likelihood Ratio Test----\n") :
the condition has length > 1 and only the first element will be used
>
> CNVtest(mod,type="Wald")
----CNV Wald test----
Chi= 17.32966 (df= 2 ) , pvalue= 0.0001725492
> plotSignal(dataMLPA$Gene2, caes.control=dataMLPA$casco)
*その他のCNV package
patchwork package:
allele-specific copy number analysis
https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-3-r24
======[p.185]================
Visualizations in GWAS studies
> library(GWASTools)
> library(SNPassoc)
>
> data(SNPs)
>
> mySNP<-setupSNP(data=SNPs,colSNPs=6:40,sep="")
>
> myres <- WGassociation(protein, data=mySNP, model="all")
> pvals <- dominant(myres)
>
> qqPlot(pvals)
Error: could not find function "qqPlot"
----------------------
> install.packages("qqman")
Installing package into ‘C:/Users
> library(qqman)
> head(gwasResults)
SNP CHR BP P
1 rs1 1 1 0.9148060
2 rs2 1 2 0.9370754
3 rs3 1 3 0.2861395
4 rs4 1 4 0.8304476
5 rs5 1 5 0.6417455
6 rs6 1 6 0.5190959
>
>
>
> manhattan(gwasResults)
>
>
>
> qq(gwasResults$P, main="Q-Q plot of P-values")
qqplot参考
http://qiita.com/kenmatsu4/items/59605dc745707e8701e0