-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathic-bps.qmd
599 lines (401 loc) · 21.3 KB
/
ic-bps.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
---
title: "Peptidomics analysis reveals changes in small urinary peptides in patients with interstitial cystitis/bladder pain syndrome"
author: "Md Shadman Ridwan Abid, **Haowen Qiu**, Bridget A. Tripp, Aline de Lima Leite, Heidi E. Roth, Jiri Adamec, Robert Powers & James W. Checco"
date: "2022-05-18"
abstract: "Interstitial cystitis/bladder pain syndrome (IC/BPS) is a chronic and debilitating pain disorder of the bladder and urinary tract with poorly understood etiology. A definitive diagnosis of IC/BPS can be challenging because many symptoms are shared with other urological disorders. An analysis of urine presents an attractive and non-invasive resource for monitoring and diagnosing IC/BPS. The antiproliferative factor (APF) peptide has been previously identified in the urine of IC/BPS patients and is a proposed biomarker for the disorder. Nevertheless, other small urinary peptides have remained uninvestigated in IC/BPS primarily because protein biomarker discovery efforts employ protocols that remove small endogenous peptides. The purpose of this study is to investigate the profile of endogenous peptides in IC/BPS patient urine, with the goal of identifying putative peptide biomarkers. Here, a non-targeted peptidomics analysis of urine samples collected from IC/BPS patients were compared to urine samples from asymptomatic controls. Our results show a general increase in the abundance of urinary peptides in IC/BPS patients, which is consistent with an increase in inflammation and protease activity characteristic of this disorder. In total, 71 peptides generated from 39 different proteins were found to be significantly altered in IC/BPS. Five urinary peptides with high variable importance in projection (VIP) coefficients were found to reliably differentiate IC/BPS from healthy controls by receiver operating characteristic (ROC) analysis. In parallel, we also developed a targeted multiple reaction monitoring method to quantify the relative abundance of the APF peptide from patient urine samples. Although the APF peptide was found in moderately higher abundance in IC/BPS relative to control urine, our results show that the APF peptide was inconsistently present in urine, suggesting that its utility as a sole biomarker of IC/BPS may be limited. Overall, our results revealed new insights into the profile of urinary peptides in IC/BPS that will aid in future biomarker discovery and validation efforts."
doi: "https://doi.org/10.1038/s41598-022-12197-2"
execute:
echo: true
#cache: true
format:
html:
toc: true
toc-location: left
reference-location: document
cold-fold: true
theme: flatly
self-contained: true
#cache: true
comments:
hypothesis: true
---
```{r message=FALSE, warning=FALSE, results='hide'}
suppressPackageStartupMessages(c(
library(tidyverse),
library(openxlsx),
library(janitor),
library(knitr),
library(impute),
library(cowplot),
library(proBatch),
library(pcaMethods),
library(pROC)
))
```
```{r message = FALSE, warning = FALSE, echo=FALSE}
script_folder = "../scripts/"
source(paste(script_folder, "normalization.R", sep = ""))
source(paste(script_folder, "multivariate.R", sep = ""))
source(paste(script_folder, "visual_functions.R", sep = ""))
# get start time
start_time <- Sys.time()
```
## Project and data background
In this study, we applied a non-targeted LC--MS and LC--MS/MS-based peptidomics approach to urine collected from IC/BPS patients and asymptomatic controls to explore differences in the profile of small urinary peptides in this disorder (@fig-analytical_workflow a). These experiments identified several peptides that can be used to differentiate IC/BPS urine from controls. In parallel, we also developed and applied a targeted LC-multiple reaction monitoring (MRM) method to compare the relative quantities of the APF peptide present in urine from both IC/BPS patients and asymptomatic controls (@fig-analytical_workflow b).
::: {#fig-analytical_workflow layout-ncol="1"}

Analytical workflow
:::
::: {.column-margin}
(a) Non-targeted LC--MS and LC--MS/MS peptidomics analysis used to identify urinary peptides that differ between IC/BPS patients and healthy controls.
(b) Targeted LC-MRM analysis used to determine the relative quantities of the APF peptide in urine from both IC/BPS patients and healthy controls.
:::
## Label-free peptidomics
A total of 995 individual peptides were identified (fragments from 149 different proteins) from the LC--MS and LC--MS/MS datasets using PEAKS Studio proteomics software with a 1% false discovery rate (FDR) threshold. Only 212 peptides with unambiguous sequence assignments were detected in at least 50% of the samples.
```{r}
FDR <- 0.05
LOG2FC <- 0.6
fh <- openxlsx::read.xlsx("https://static-content.springer.com/esm/art%3A10.1038%2Fs41598-022-12197-2/MediaObjects/41598_2022_12197_MOESM1_ESM.xlsx", sheet = 2, colNames = FALSE) %>%
as_tibble() %>%
select(-c(2:8))
df = fh %>%
janitor::row_to_names(row_number = 2, remove_rows_above = FALSE) %>%
janitor::clean_names() %>%
as.data.frame()
labels_d1 <- as.matrix(df[1,-1]) %>%
`rownames<-`("Label")
d1 = df[-1,] %>%
as_tibble() %>%
column_to_rownames(., var = "peptide_sequence") %>%
mutate(across(everything(), as.numeric))
anno = data.frame(Label = as.factor(t(df)[-1, 1]))
#sum(d1==0) ## Total missing values
d1[d1 == 0] <- NA ## Change zeros to NA
#sum(is.na(d1))
```
Data preview
::: {.panel-tabset .column-page-inset-right}
```{r message=FALSE, warning=FALSE}
# display table
d1 %>%
knitr::kable(., digits = 3, "html") %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
kableExtra::scroll_box(width = "100%", height = "500px")
```
:::
Three step data processing:
* Missing value imputation
* Log2 transformation
* EigenMS normalization
::: {.panel-tabset}
### Missing value imputation
There is a fair amount of missing values in the dataset, which is typical for MS data in general and untargeted LC-MS proteomics. Missing value imputation was accomplished using KNN.
::: {.column-margin}
There are potentially three main reasons behind missing values in mass spec data:
* Feature is not present in the biological sample
* Feature is present in the sample but the concentration is below limit of detection
* Feature is present in the sample and above limit of detection but was not annotated as a peak during deconvolution process
:::
```{r message=FALSE, warning=FALSE}
### Use this to count the missing signal values by feature
d1$nacount <- as.matrix(apply(d1, 1, function(x) sum(is.na(x))))
#write.csv(d1, "Unnormalized peak areas/missingtotals.csv")
d2 <- d1[which(d1$nacount < round((length(d1)-1)*.5)), ]
d2 <- as.data.frame(d2[,1:length(d2)-1])
#sum(is.na(d2))
#[1] 5084
imputed_meta_d1 <- impute::impute.knn(as.matrix(d2),k = 7, rowmax = 0.5,
colmax = 0.8, maxp = 1500)
imputed_d1 <- as.data.frame(imputed_meta_d1[["data"]], colnames = TRUE)
hist_impute <- ggplot_truehist(unlist(imputed_d1), "After imputation")
qq_impute <- ggplot_carqq(unlist(imputed_d1), "After imputation")
#pca_impute <- ggplot_pca(imputed_d1, labels_d1, "class","After imputation")
```
```{r fig.dim=c(12, 6), out.width= "100%", message = FALSE, warning = FALSE, results = "hide"}
plot_grid(hist_impute, qq_impute, nrow = 1)
```
As is shown above, data after imputation is not normally distributed, therefore log2 transformation is the next step.
### Log2 transformation
Data transformation applies a mathematical transformation on individual values themselves. For mass spec data, log transformation is a good choice, as it reduces or removes the skewness of mass spec data.
::: {.column-page-inset-right}
```{r message=FALSE, warning=FALSE}
log2_d1 <- log2(type.convert(imputed_d1)) ## log2 transformation
# save data
#write.csv(log2_d1, file.path(WORKING_DIR, "log2_transformed.csv"))
# draw histogram
hist_log2 <- ggplot_truehist(unlist(log2_d1), "Log2 Transformed")
qq_log2 <- ggplot_carqq(unlist(log2_d1), "Log2 Transformed")
pca_log2 <- ggplot_pca(log2_d1, anno, "Label","Log2 Transformed")
```
```{r fig.dim=c(18, 6), out.width= "100%", message = FALSE, warning = FALSE, results = "hide"}
plot_grid(hist_log2, qq_log2, pca_log2, nrow = 1)
```
```{r message=FALSE, warning=FALSE}
# display table
log2_d1 %>%
mutate(across(everything(), round, digit=2)) %>%
knitr::kable(., digits = 3, "html") %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
kableExtra::scroll_box(width = "100%", height = "500px")
```
:::
### EigenMS normalization
After imputation and log transformation, the data was normalized using [EigenMS](https://doi.org/10.1371/journal.pone.0116221) to account for sample-to-sample variability. EigenMS normalization preserves the treatment group differences in the data by estimating treatment effects with an ANOVA model, then uses singular value decomposition on the model residual matrix to identify and remove the bias.
::: {.column-page-inset-right}
```{r message=FALSE, warning=FALSE, eval=FALSE}
norm_d1 <- do_normalization_short(log2_d1, labels_d1)
hist_eigenms <- ggplot_truehist(unlist(norm_d1[-1,]), "log2-EigenMS")
qq_eigenms <- ggplot_carqq(unlist(norm_d1[-1,]), "log2-EigenMS")
pca_eigenms <- ggplot_pca(norm_d1[-1,], anno, "Label","log2-EigenMS")
```
```{r message=FALSE, warning=FALSE, include=FALSE}
norm_d1 <- do_normalization_short(log2_d1, labels_d1)
hist_eigenms <- ggplot_truehist(unlist(norm_d1[-1,]), "log2-EigenMS")
qq_eigenms <- ggplot_carqq(unlist(norm_d1[-1,]), "log2-EigenMS")
pca_eigenms <- ggplot_pca(norm_d1[-1,], anno, "Label","log2-EigenMS")
```
```{r fig.dim=c(18, 6), out.width="100%"}
plot_grid(hist_eigenms, qq_eigenms, pca_eigenms, nrow = 1)
```
```{r}
norm_d1_mod = norm_d1[-1,] %>%
as.data.frame() %>%
rownames_to_column(., var = "rowname") %>%
mutate(across(-rowname, as.numeric)) %>%
column_to_rownames(., var = "rowname")
#write.csv(norm_d1_mod, file = "after_norm_data.csv")
# display table
norm_d1_mod %>%
mutate(across(everything(), round, digit=2)) %>%
knitr::kable(., digits = 3, "html") %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
kableExtra::scroll_box(width = "100%", height = "500px")
```
:::
### Peak intensity plots
EigenMS removes sample-to-sample variability.
::: {.panel-tabset .column-page-inset-right}
### Intensity after log2 transformation
```{r message=FALSE, warning=FALSE}
Label = as.matrix(df[1,-1]) %>%
t(.) %>%
as.data.frame(.) %>%
`colnames<-`(., "Label") %>%
rownames_to_column(., var = "sample")
color_list = sample_annotation_to_colors(Label,sample_id_col = "sample", factor_columns = c("Label"), numeric_columns = NULL)
# after log2 transformation
log2_d1_long = matrix_to_long(log2_d1, sample_id_col = "sample")
plot_boxplot(log2_d1_long, Label, sample_id_col = "sample", batch_col = "Label", color_scheme = color_list[["Label"]], ylimits = c(-15,30)) + ylab("Intensity (log2 scale)") # y-range = 45 for comparison purpose
```
### Intensity after EigenMS normalization
```{r message=FALSE, warning=FALSE}
#after normalization
norm_d1_long = matrix_to_long(norm_d1_mod, sample_id_col = "sample")
plot_boxplot(norm_d1_long, Label, sample_id_col = "sample", batch_col = "Label", color_scheme = color_list[["Label"]], ylimits = c(-15,30)) + ylab("Intensity (log2 scale, after normalization)") # y-range = 45 for comparison purpose
```
:::
:::
<!-- ::: {.column-margin} -->
<!-- | Name | Abbreviation | -->
<!-- |:----------:|:-------------------------------------:| -->
<!-- | pT | p-value from t-test | -->
<!-- | BHT | adjusted p-value for t-test | -->
<!-- | pW | p-value from Wilcoxon rank-sum test | -->
<!-- | BHW | adjusted p-value for Wilcoxon rank-sum test | -->
<!-- | FC(lin) | linear fold-change | -->
<!-- | FC(log2) | log2 fold-change | -->
<!-- | padj | minimum of BHT and BHW | -->
<!-- | -log10padj | -log10(padj) | -->
<!-- | Status | (up or down)-regulation | -->
<!-- : Result table column name abbreviation -->
<!-- ::: -->
## Univariate analysis
Univariate statistic analysis will proceed for each pairs of comparison. For each pair,
Step 1, differential analysis;
* t-test
* Wilcoxon rank-sum test
For all statistical tests, the Benjamini-Hochberg (BH) procedure was applied to correct for multiple hypothesis testing.
Step 2, fold change, both linear and log2;
Step 3, regulation shown in volcano plot.
* A feature is considered unregulated when `FC(log2)` > 0.6 (or `FC(lin)` > 1.5);
* A feature is considered downregulated when `FC(log2)` < -0.6 (or `FC(lin)` < -1.5).
```{r}
d5 <- rbind.data.frame(labels_d1, norm_d1_mod)
d5_mod <- t(d5) %>%
as.data.frame(.) %>%
rownames_to_column(., var = "rowname") %>%
as_tibble() %>%
#dplyr::rename(., Label = class) %>%
mutate(across(-c(Label,rowname), as.numeric)) %>%
#rename_with(str_trim)
column_to_rownames(., var = "rowname")
grps = as.factor(t(labels_d1))
```
::: {.panel-tabset .column-page-inset-right}
### Full stat
```{r message=FALSE, warning=FALSE}
temp_d <- d5_mod %>%
mutate(across(-Label, ~ 2^(.) )) %>% # d3_mod is log-scale number, this step will de-log and change data back to linear number
group_by(Label) %>%
summarise(across(everything(), mean))
print(paste("Order of Fold Change:", temp_d$Label[2], "over", temp_d$Label[1], sep = " "))
uni_res <- do_univariate(d5_mod)
uni_res <- uni_res %>%
rowwise() %>%
mutate(padj = min(c(BHT, BHW))) %>%
# get the lowest padj
ungroup() %>%
mutate(`-log10padj` = -log10(padj))
#uni_res_annotation = uni_res %>% left_join(., feature_meta, by = c("variable" = "Feature"))
#uni_res_annotation %>% create_dt()
uni_res %>%
knitr::kable(., digits = 3, "html") %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
kableExtra::scroll_box(width = "100%", height = "500px")
```
### Significant stat
Univariate analysis identified 71 peptides with a p-value less than 0.05 from both the BH-corrected t test and Wilcoxon rank-sum tests. These peptides exhibited a statistically significant differential abundance between IC/BPS and healthy controls. The 71 differential peptides were degradation products from a variety of proteins, including several previously associated with IC/BPS from prior proteomics studies.
```{r}
# get DE features only tibble
uni_res_filt <- uni_res %>%
dplyr::filter(BHT < FDR & BHW < FDR) %>%
mutate(status = case_when(`FC(log2)` < - 0.6 ~ "Down",
`FC(log2)` > 0.6 ~ "Up") )
if (nrow(uni_res_filt) == 0){
print(paste("There is no significant differentiated features between",
temp_d$Label[2], "and", temp_d$Label[1], sep = " "))
} else {
uni_res_filt = uni_res_filt #%>%
#left_join(., feature_meta, by = c("variable" = "Feature"))
#write.csv(uni_res_filt, file = paste(combo_list[[i]][1], "vs", combo_list[[i]][2], ".csv", sep = ""))
uni_res_filt %>%
knitr::kable(., digits = 3, "html") %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
kableExtra::scroll_box(width = "100%", height = "500px")
}
```
### Volcano plot
Interestingly, all of the differential peptides were found in a higher abundance in IC/BPS relative to control samples.
```{r message=FALSE, warning=FALSE, fig.dim=c(6, 6)}
# volcano plot
volcano_plot(uni_res, feature_col = "variable", fdr = FDR, log2fc = LOG2FC, save = FALSE)
```
### Heatmap
```{r message=FALSE, warning=FALSE, fig.dim=c(15, 10)}
# heatmap
anno <- data.frame(Label = as.factor(t(d5)[, "Label"]))
hm <- d5[-1, d5[1,] %in% levels(grps) ] %>%
rownames_to_column("variable") %>%
mutate(across(-variable, as.numeric)) %>%
dplyr::filter(., variable %in% uni_res_filt$variable)
heatmap(hm, feature_col = "variable", sample_anno = anno, rowname_switch = TRUE, save = FALSE)
```
:::
## Multivariate analysis
Supervised PLS-DA is included to assist feature selection, as one of the goal of this study is to identifying putative peptide biomarkers to distinguish between IC/BPS patient urine samples from control urine samples.
::: {.panel-tabset .column-page-inset-right}
### PLS-DA
```{r}
label = as.character(norm_d1[1, ])
plsda_full <- pls_da(norm_d1_mod, label, WORKING_DIR=getwd())
plsda_full_res <- plsda_full[[1]]
plsda_full_plot <- plsda_full[[2]]
plsda_full_vip <- plsda_full[[3]]
```
```{r message=FALSE, warning=FALSE}
pls_da_score = plsda_full_res@scoreMN %>%
as.data.frame(.) %>%
rownames_to_column(., var = "name") %>%
merge(rownames_to_column(as.data.frame(t(labels_d1)), var = "name"), by = "name")
#write.csv(pls_da_score, file = "after_norm_plsda_score.csv")
#plsda_merged <- merge(t(rbind.data.frame(labels_d1, norm_d1_mod)),
# plsda_full_res@scoreMN, by=0)
```
### VIP table
```{r}
vip = tibble("VIP" = names(plsda_full_vip),
"score" = plsda_full_vip ) %>%
mutate(across(c("score"), ~round(.x, 4)))
vip %>% knitr::kable(., digits = 3, "html") %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
kableExtra::scroll_box(width = "100%", height = "500px")
#write.csv(vip, file = "after_norm_plsda_vip.csv")
```
:::
## AUROC
PLS-DA identified five peptides with a VIP coefficient greater than 2.00 and a p-value less than 0.05 from both a BH-corrected t test and a Wilcoxon rank-sum test. These five peptides were used as input to calculate a ROC curve, which yielded an AUC of 97.0% from a logistic regression model or an AUC of 92.4% from a random forest model.
::: {.column-page-inset-right}
```{r message=FALSE, warning=FALSE, include=FALSE}
# First take top variable based on VIP list
top.vip = dplyr::filter(vip, score>=2)
d6 = data.frame(matrix(ncol = ncol(norm_d1_mod), nrow = 0))
colnames(d6) = colnames(norm_d1_mod)
for (i in 1:nrow(norm_d1_mod)) {
if (is.element(rownames(norm_d1_mod)[i], top.vip$VIP )){ #rownames(top.vip)
d6 = rbind(d6, norm_d1_mod[i,])
}
}
d6 = rbind(labels_d1,d6)
# Random forest
RF = as.data.frame(t(d6)) %>%
rownames_to_column(.,var = "rowname")%>%
mutate(across(-c(rowname,Label), as.numeric)) %>%
column_to_rownames(., var = "rowname")%>%
mutate_at(vars(Label), factor)
set.seed(42)
RF.model = randomForest::randomForest(Label ~ ., data = RF, importance=TRUE, proximity=TRUE)
# Logistic regression
logistic.model <- glm(Label ~ . , data = RF, family="binomial")
```
```{r message=FALSE, warning=FALSE, fig.dim=c(6,6)}
par(pty="s")
logistic_roc = roc(RF$Label,logistic.model$fitted.values, plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage", col="#377eb8", lwd=4, print.auc=TRUE,print.auc.cex=1.5 )
plot.roc(RF$Label, RF.model$votes[,1],percent=TRUE, col="#4daf4a", lwd=4, print.auc=TRUE, add=TRUE, print.auc.y=40, print.auc.cex=1.5)
legend(50, 20, legend=c("Logisitic Regression", "Random Forest"), col=c("#377eb8", "#4daf4a"), lwd=4, bty = 'n' )
par(pty="m")
```
These results indicate that the five peptides provide a high predictive capability to distinguish urine from IC/BPS patients and healthy controls. ROC curves using the individual peptides generally showed a reduced predictive capability, demonstrating that the combination of peptide abundances is required for high predictive ability.
```{r}
# create list of combinations of all length from top.vip
list_top_vip = top.vip$VIP #rownames(top.vip)
roc_combo_list = do.call(c, lapply(seq_along(list_top_vip), combn, x = list_top_vip, simplify = FALSE))
```
```{r message=FALSE, warning=FALSE, fig.dim=c(18, 12)}
par(mfrow=c(2,3))
for (a in 1:5) {
temp = data.frame(matrix(ncol = ncol(norm_d1), nrow = 0))
colnames(temp) = colnames(norm_d1)
for (i in 1:nrow(norm_d1)) {
if (rownames(norm_d1)[i] == roc_combo_list[[a]]){
temp = rbind(temp, norm_d1[i,])
}
i = i+1
}
temp = rbind(labels_d1,temp)
# Random forest
RF = as.data.frame(t(temp)) %>%
rownames_to_column(.,var = "rowname")%>%
mutate(across(-c(rowname,Label), as.numeric)) %>%
column_to_rownames(., var = "rowname")%>%
mutate_at(vars(Label), factor)
set.seed(42)
RF.model = randomForest::randomForest(Label ~ ., data = RF, importance=TRUE, proximity=TRUE)
# Logistic regression
logistic.model <- glm(Label ~ . , data = RF, family="binomial")
#par(pty="s")
roc(RF$Label,logistic.model$fitted.values,plot=TRUE, legacy.axes=TRUE, percent=TRUE, xlab="False Positive Percentage", ylab="True Postive Percentage", main = roc_combo_list[[a]], col="#377eb8", lwd=4, print.auc=TRUE, print.auc.cex=1.5)
plot.roc(RF$Label, RF.model$votes[,1],percent=TRUE, col="#4daf4a", lwd=4, print.auc=TRUE, add=TRUE, print.auc.y=40, print.auc.cex=1.5)
legend(50, 20, legend=c("Logisitic Regression", "Random Forest"), col=c("#377eb8", "#4daf4a"), lwd=4, bty = 'n' )
#par(pty="m")
}
```
:::
## Reproducibility
The amount of time took to generate the report:
```{r time_spend}
Sys.time() - start_time
```
*R* session information:
```{r R_session}
sessionInfo()
```