Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with names of columns like probability_* #121

Open
juliasilge opened this issue Jul 6, 2023 · 1 comment
Open

Problem with names of columns like probability_* #121

juliasilge opened this issue Jul 6, 2023 · 1 comment

Comments

@juliasilge
Copy link
Member

juliasilge commented Jul 6, 2023

From this Stack Overflow question, this will not work:

library(tidyverse)
library(probably)
#> 
#> Attaching package: 'probably'
#> The following objects are masked from 'package:base':
#> 
#>     as.factor, as.ordered

set.seed(100)
test_df <- tibble(
  probability_x = runif(100),
  Label = as.factor(case_when(probability_x > 0.5 ~ "x", TRUE ~ "y"))
)

cal_plot_breaks(test_df, Label, probability_x)
#> Error in `purrr::map()`:
#> ℹ In index: 2.
#> Caused by error in `estimate_str[[.x]]`:
#> ! subscript out of bounds
#> Backtrace:
#>      ▆
#>   1. ├─probably::cal_plot_breaks(test_df, Label, probability_x)
#>   2. ├─probably:::cal_plot_breaks.data.frame(test_df, Label, probability_x)
#>   3. │ └─probably:::cal_plot_breaks_impl(...)
#>   4. │   ├─probably::.cal_table_breaks(...)
#>   5. │   └─probably:::.cal_table_breaks.data.frame(...)
#>   6. │     └─probably:::.cal_table_breaks_impl(...)
#>   7. │       └─probably:::truth_estimate_map(...)
#>   8. │         └─purrr::map(seq_along(truth_levels), ~sym(estimate_str[[.x]]))
#>   9. │           └─purrr:::map_("list", .x, .f, ..., .progress = .progress)
#>  10. │             ├─purrr:::with_indexed_errors(...)
#>  11. │             │ └─base::withCallingHandlers(...)
#>  12. │             ├─purrr:::call_with_cleanup(...)
#>  13. │             └─probably (local) .f(.x[[i]], ...)
#>  14. │               └─rlang::sym(estimate_str[[.x]])
#>  15. │                 └─rlang::is_symbol(x)
#>  16. └─purrr (local) `<fn>`(`<sbscOOBE>`)
#>  17.   └─cli::cli_abort(...)
#>  18.     └─rlang::abort(...)

Created on 2023-07-06 with reprex v2.0.2

But the same code works if we change the column name to .pred_x:

library(tidyverse)
library(probably)
#> 
#> Attaching package: 'probably'
#> The following objects are masked from 'package:base':
#> 
#>     as.factor, as.ordered

set.seed(100)
test_df <- tibble(
  .pred_x = runif(100),
  Label = as.factor(case_when(.pred_x > 0.5 ~ "x", TRUE ~ "y"))
)

cal_plot_breaks(test_df, Label, .pred_x)

Created on 2023-07-06 with reprex v2.0.2

I see that the docs say:

A vector of column identifiers, or one of dplyr selector functions to choose which variables contains the class probabilities. It defaults to the prefix used by tidymodels (.pred_). The order of the identifiers will be considered the same as the order of the levels of the truth variable.

But it doesn't seem clear that they have to be .pred_x and similar.

@sworland-thyme
Copy link

I also ran into this issue and it appears that the columns are identified using either the defaults (.pred_x) or the position. A more informative error message such as cli::cli_abort("{.arg truth} please use the .pred_x naming convention.") would help but that doesn't seem sufficient for a PR, so for now I just changed my code (the column was named prob).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants