Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training ranger model fails when setting write.forest #1177

Closed
holgerbrandl opened this issue Oct 6, 2020 · 4 comments
Closed

Training ranger model fails when setting write.forest #1177

holgerbrandl opened this issue Oct 6, 2020 · 4 comments

Comments

@holgerbrandl
Copy link

When building a simple ranger model for iris (which is the same as in the official docs, caret fails to

  1. To build the model in the presence of write.forrest=T
  2. To write the forrest by default (which is the default setting in ranger), and thus the resulting model can be properly persisted (e.g. via carrier for model serving via mlflow)
require(ranger)
require(caret)
require(dplyr)
require(e1071)
require(carrier)

rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE)
predict(rf, iris)$predictions
# --> works

rfSimple = train(
Species ~ .,
data = iris,
method = "ranger",
num.trees = 5,
write.forrest=True
)
# --> fails with
# Something is wrong; all the Accuracy metric values are missing:


# train without write.forrest argument
rfNoWF = train(
Species ~ .,
data = iris,
method = "ranger",
num.trees = 5
)
# --> seems fine...

## test presence with carrier
predictor <- crate(~ ranger:::predict.ranger(! ! rfNoWF, .x), ! ! rfNoWF)
predictor(iris)
## -->...but fails with
## Error in ranger:::predict.ranger(list(method = "ranger", modelInfo = list( :
## Error: No saved forest in ranger object. Please set write.forest to TRUE when calling ranger.

Session Info:

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] carrier_0.1.0   e1071_1.7-3     dplyr_1.0.0     caret_6.0-86
[5] ggplot2_3.2.1   lattice_0.20-38 ranger_0.12.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3           pillar_1.4.3         compiler_3.6.1
 [4] gower_0.2.1          plyr_1.8.4           tools_3.6.1
 [7] iterators_1.0.12     class_7.3-15         rpart_4.1-15
[10] ipred_0.9-9          lubridate_1.7.4      lifecycle_0.2.0
[13] tibble_2.1.3         gtable_0.3.0         nlme_3.1-140
[16] pkgconfig_2.0.3      rlang_0.4.6          Matrix_1.2-17
[19] foreach_1.4.7        prodlim_2019.11.13   stringr_1.4.0
[22] withr_2.1.2          pROC_1.16.2          generics_0.0.2
[25] vctrs_0.3.1          recipes_0.1.10       stats4_3.6.1
[28] nnet_7.3-12          grid_3.6.1           tidyselect_1.1.0
[31] data.table_1.12.8    glue_1.4.1           R6_2.4.1
[34] survival_2.44-1.1    lava_1.6.7           reshape2_1.4.3
[37] purrr_0.3.3          magrittr_1.5         ModelMetrics_1.2.2.2
[40] scales_1.1.0         codetools_0.2-16     MASS_7.3-51.4
[43] splines_3.6.1        timeDate_3043.102    colorspace_1.4-1
[46] stringi_1.4.3        lazyeval_0.2.2       munsell_0.5.0
[49] crayon_1.3.4

FYI @mnwright

@mnwright
Copy link
Contributor

mnwright commented Oct 6, 2020

It's write.forest = TRUE (one r in forest and capitalized TRUE). However, caret already sets write.forest = TRUE and you'll get a "matched by multiple actual arguments" error.

@holgerbrandl
Copy link
Author

Thanks @mnwright for pointing out my mistake here. I've corrected it and also noticed that when using caret I have to use `caret::predict.train when packaging the model with carrier. This fixes the packaging when using ranger along with caret.

The revised code works unless I specify write.forest when building rfSimple. This fails (as pointed out by @mnwright ) with:

model fit failed for Resample09: mtry=3, min.node.size=1, splitrule=gini Error in ranger::ranger(dependent.variable.name = ".outcome", data = x,  :
  formal argument "write.forest" matched by multiple actual arguments

So the minor bug is that the user can not set write.forest when using caret as wrapper API.

Corrected code:

require(ranger)
require(caret)
require(dplyr)
require(e1071)
require(carrier)

rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE)
predict(rf, iris)$predictions
# --> works

predictor <- crate(~ ranger:::predict.ranger(!!rf, .x)$predictions, !!rf)
predictor(iris)
# --> works fine

## test presence with carrier

rfSimple = train(
Species ~ .,
data = iris,
method = "ranger",
num.trees = 5,
write.forest=TRUE
)
# --> fails with
# Something is wrong; all the Accuracy metric values are missing:


# train without write.forest argument
rfNoWF = train(
Species ~ .,
data = iris,
method = "ranger",
num.trees = 5
)
# --> seems fine..

## test presence with carrier
predictor <- crate(~ caret:::predict.train(!! rfNoWF, .x), !! rfNoWF)
predictor(iris)
## --> ..and works now!

@holgerbrandl holgerbrandl changed the title training ranger fails when setting write.forrest Training ranger model fails when setting write.forest Oct 6, 2020
@topepo
Copy link
Owner

topepo commented Feb 10, 2021

The initial issue was that you used True instead of TRUE as shown in the warnings:

49: model fit failed for Resample03: mtry=2, min.node.size=1, splitrule=gini Error in method$fit(x = x, y = y, wts = wts, param = tuneValue, lev = obsLevels, : object 'True' not found

This code:

rfSimple = train(
    Species ~ .,
    data = iris,
    method = "ranger",
    num.trees = 5,
    write.forest=TRUE
)

failed because caret already sets that option. Also shown in the warnings:

50: model fit failed for Resample09: mtry=3, min.node.size=1, splitrule=gini Error in ranger::ranger(dependent.variable.name = ".outcome", data = x, : formal argument "write.forest" matched by multiple actual arguments

@topepo topepo closed this as completed Feb 10, 2021
@odgersn
Copy link

odgersn commented Mar 7, 2025

@topepo naiive question but how do we know that caret already sets write.forest = TRUE? I am running into a similar issue today with ranger's probability argument. caret is apparently setting its value to FALSE but I would like to set it to TRUE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants