-
Notifications
You must be signed in to change notification settings - Fork 18
/
Copy path505_featureSelectionExample.Rmd
83 lines (63 loc) · 1.95 KB
/
505_featureSelectionExample.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# Feature Selection Example
Feature Selection in R and Caret
```{r}
library(caret)
library(doParallel) # parallel processing
library(dplyr) # Used by caret
library(pROC) # plot the ROC curve
library(foreign)
### Use the segmentationData from caret
# Load the data and construct indices to divided it into training and test data sets.
#set.seed(10)
kc1 <- read.arff("./datasets/defectPred/D1/KC1.arff")
```
```{r}
inTrain <- createDataPartition(y = kc1$Defective,
## the outcome data are needed
p = .75,
## The percentage of data in the
## training set
list = FALSE)
```
The function `createDataPartition` does a stratified partitions.
```{r}
training <- kc1[inTrain,]
nrow(training)
testing <- kc1[-inTrain, ]
nrow(testing)
```
The train function can be used to
+ evaluate, using resampling, the effect of model tuning parameters on performance
+ choose the "optimal" model across these parameters
+ estimate model performance from a training set
```{r}
fitControl <- trainControl(## 10-fold CV
method = "repeatedcv",
number = 10,
## repeated ten times
repeats = 10)
```
gbmFit1 <- train(Defective ~ ., data = training,
method = "gbm",
trControl = fitControl,
## This last option is actually one
## for gbm() that passes through
verbose = FALSE)
gbmFit1
```{r}
plsFit <- train(Defective ~ .,
data = training,
method = "pls",
## Center and scale the predictors for the training
## set and all future samples.
preProc = c("center", "scale")
)
```
```{r}
# To fix
# testPred <- predict(plsFit, testing)
# postResample(testPred, testing$Defective)
# sensitivity(testPred, testing$Defective)
# confusionMatrix(testPred, testing$Defective)
```
When there are three or more classes, confusion matrix will show the confusion matrix and a set of "one-versus-all" results.