今日の範囲[p.17 How t do it.. 6~
今日のデータirisを準備
> data(iris)
> dim(iris)
[1] 150 5
----p.17: How to do it ... (6)--------------------------------
* rnorm関数を理解する。正規分布のランダムサンプルを生成。
> r1 <- rnorm(n=30, mean=1, sd=0.1)
平均1、標準偏差0.1の正規分布のサンプルを30個生成。
> hist(r1) 右Plots画面にヒストグラム図表示
> r2 <- rnorm(30,1,0.1) 省略可能
> r2
[1] 1.1178794 1.0397319 1.1823451 1.0711442 0.8951000 1.0480523
[7] 1.0575540 0.8427557 1.0371548 0.9453643 1.0241647 0.9978094
[13] 0.9937166 1.1514011 1.1076591 0.9793242 0.8742485 0.7217720
[19] 0.7526474 1.0042627 1.0630212 1.0012110 0.7663690 0.8845721
[25] 0.8831569 0.9861942 1.0378971 1.0402650 0.9569134 1.0752101
* 基本統計量の計算方法を理解する
> mean(r1) データの平均値
[1] 0.9846299
> sd(r1) データの標準偏差
[1] 0.1142747
> var(r1) データの分散
[1] 0.01305872
前回の復習
> summary(r1) 基本統計量の要約
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.7218 0.9077 1.0030 0.9846 1.0550 1.1820
個別に最小値と最大値を求めるには?
> max(r1) 最大値
[1] 1.182345
> min(r1) 最小値
[1] 0.721772
> median(r1) 中央値
[1] 1.002737
---------------------------------------------------------------------------
p.17に戻りStalk.Lengthデータを生成
> Stalk.Length <- c (rnorm(30,1,0.1),rnorm(30,1.3,0.1),rnorm(30,1.5,0.1),rnorm(30,1.8,0.1),rnorm(30,2,0.1)) 平均が異なり分散が同じ30x5=150データを生成
> Stalk.Length 生成データを表示確認
[1] 0.8921421 1.0550530 1.1308808 1.0796513 0.9369771 1.2047349
[7] 1.0407067 1.1382617 0.8394090 1.0031632 0.9353475 0.9985884
[13] 1.0261368 0.9268915 1.1213712 0.9048967 0.9542164 0.7478351
[19] 0.9467987 0.9610430 0.9189688 0.9963100 1.0728138 0.8803209
[25] 0.8175551 1.0820303 1.0246283 0.9456785 1.0258660 1.0243214
[31] 1.3204919 1.4286443 1.3333357 1.3111258 1.3336106 1.3329345
[37] 1.5286558 1.4232648 1.2486023 1.2125026 1.4306571 1.3297766
[43] 1.2885587 1.1378017 1.1925756 1.2586319 1.1912726 1.3750280
[49] 1.5523592 1.3202334 1.1852053 1.3818343 1.2386627 1.3716788
[55] 1.1625793 1.3611949 1.1842638 1.2880419 1.1900505 1.4750548
[61] 1.4866536 1.3839021 1.4773459 1.6433605 1.3753647 1.5001304
[67] 1.5762595 1.5509684 1.4047298 1.4473762 1.4541600 1.5092810
[73] 1.6442589 1.3599851 1.4266139 1.5629679 1.5051306 1.5866319
[79] 1.5243248 1.2778475 1.5087839 1.6571711 1.5519017 1.4490116
[85] 1.5304438 1.3881790 1.2822249 1.4548542 1.5734907 1.3877803
[91] 1.8044793 1.7866928 1.8376624 1.7233526 1.6510363 1.8221745
[97] 1.6071475 1.8699298 1.5356849 1.8635871 1.6344142 1.9058911
[103] 1.7820783 1.7989895 1.8578040 1.6702627 1.6751173 1.5967798
[109] 1.7116072 1.9686936 1.6109489 1.9218557 2.2085703 1.6851783
[115] 1.7459801 1.7708710 1.7672610 1.8109322 1.6588718 1.6699241
[121] 2.0098012 1.9673040 2.0349714 2.0115345 2.1436555 2.0974409
[127] 2.0470874 1.9747315 2.0510872 1.8614584 1.9767346 1.9725971
[133] 1.9572242 1.8859643 1.9810844 2.1300231 1.9238299 2.0252168
[139] 1.9009679 1.9859873 1.9643309 1.9414266 2.0996803 1.9953504
[145] 2.0346951 1.9975663 2.0405251 2.0290622 2.1631461 1.9990700
> myiris4 <- cbind(iris, Stalk.Length) 最後列として追加
> myiris4
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Stalk.Length
1 5.1 3.5 1.4 0.2 setosa 1.1377273
2 4.9 3.0 1.4 0.2 setosa 0.8413762
3 4.7 3.2 1.3 0.2 setosa 1.0871570
4 4.6 3.1 1.5 0.2 setosa 1.1239159
5 5.0 3.6 1.4 0.2 setosa 0.9876178
6 5.4 3.9 1.7 0.4 setosa 1.0413773
7 4.6 3.4 1.4 0.3 setosa 0.9635390
8 5.0 3.4 1.5 0.2 setosa 1.1410253
9 4.4 2.9 1.4 0.2 setosa 1.0697473
> dim(iris)
[1] 150 5
> dim(myiris4)
[1] 150 6
=============[p18]==========================================
--------(7)-----------------------------------------------
cbindを使わず1ステップでStalk.Length列を追加
> myiris5 <- iris 準備
> dim(myiris5)
[1] 150 5
> myiris5$Stalk.Length <- c (rnorm(30,1,0.1),rnorm(30,1.3,0.1),rnorm(30,1.5,0.1),rnorm(30,1.8,0.1),rnorm(30,2,0.1))
--------(8)-----------------------------------------------
列追加データの確認
> dim(myiris5) 6列目を確認
[1] 150 6
> myiris5 内容表示
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Stalk.Length
1 5.1 3.5 1.4 0.2 setosa 1.0633843
2 4.9 3.0 1.4 0.2 setosa 1.1781134
3 4.7 3.2 1.3 0.2 setosa 1.0114039
4 4.6 3.1 1.5 0.2 setosa 0.9795390
5 5.0 3.6 1.4 0.2 setosa 0.9872895
> colnames(myiris5) 列名確認
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" "Stalk.Length"
-------(9)------------------------------------------------
同様に行追加コマンドrbindを使ってみる
> newdat <- data.frame(Sepal.Length=10.1, Sepal.Width=0.5, Petal.Length=2.5, Petal.Width=0.9, Species="myspecies")
> newdat
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 10.1 0.5 2.5 0.9 myspecies
> myiris6 <- rbind(iris,newdat) newdat行を追加
> dim(myiris6)
[1] 151 5
> myiris6[151,] 151行目を表示確認
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
151 10.1 0.5 2.5 0.9 myspecies
-------(10)----------------------------------------------
条件一致のデータ抽出
> mynew.iris1 <- subset(myiris6,Sepal.Length == 10.1)
> mynew.iris2 <- myiris6[myiris6$Sepal.Length == 10.1,]
> mynew.iris1
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
151 10.1 0.5 2.5 0.9 myspecies
> mynew.iris2
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
151 10.1 0.5 2.5 0.9 myspecies
> mynew.iris3 <- subset(iris, Species == "setosa")
-------(11)------------------------------------------------
抽出内容の確認
> mynew.iris3[1,]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
> head(mynew.iris3)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
条件文に論理演算子を使う(AND & , OR | , NOT !)
> iris_2species <- subset(iris, (Species == "setosa") | (Species == "versicolor") )
> iris_2species
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
--[p.19 There's more]--------------------------------------------------------------
%in% 演算子でベクトル条件抽出
> mylength <- c(4:7,7.2) ベクトルデータ生成
> mylength
[1] 4.0 5.0 6.0 7.0 7.2
> mynew.iris4 <- iris[iris[,1] %in% mylength,]
> mynew.iris4
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5 5.0 3.6 1.4 0.2 setosa
8 5.0 3.4 1.5 0.2 setosa
26 5.0 3.0 1.6 0.2 setosa
27 5.0 3.4 1.6 0.4 setosa
36 5.0 3.2 1.2 0.2 setosa
41 5.0 3.5 1.3 0.3 setosa
44 5.0 3.5 1.6 0.6 setosa
50 5.0 3.3 1.4 0.2 setosa
51 7.0 3.2 4.7 1.4 versicolor
61 5.0 2.0 3.5 1.0 versicolor
63 6.0 2.2 4.0 1.0 versicolor
79 6.0 2.9 4.5 1.5 versicolor
84 6.0 2.7 5.1 1.6 versicolor
86 6.0 3.4 4.5 1.6 versicolor
94 5.0 2.3 3.3 1.0 versicolor
110 7.2 3.6 6.1 2.5 virginica
120 6.0 2.2 5.0 1.5 virginica
126 7.2 3.2 6.0 1.8 virginica
130 7.2 3.0 5.8 1.6 virginica
139 6.0 3.0 4.8 1.8 virginica
------[p.21]----------------------------------------------------------
データ欠損値NAの取扱い
> a <- c(1:4,NA,6)
> a
[1] 1 2 3 4 NA 6
> mean(a) 平均値を求められない
[1] NA
> mean(a, na.rm=TRUE)
[1] 3.2
---------[p.22]-------------------------------------------------------
> n.data <- rnorm(100,1,0.1)
> hist(n.data)
> plot(density(n.data))
> ?pnorm
> plot(y)
> plot(density(n.data))
> ?qnorm