Problem with names of columns like `probability_*` #121

juliasilge · 2023-07-06T18:01:54Z

From this Stack Overflow question, this will not work:

library(tidyverse)
library(probably)
#> 
#> Attaching package: 'probably'
#> The following objects are masked from 'package:base':
#> 
#>     as.factor, as.ordered

set.seed(100)
test_df <- tibble(
  probability_x = runif(100),
  Label = as.factor(case_when(probability_x > 0.5 ~ "x", TRUE ~ "y"))
)

cal_plot_breaks(test_df, Label, probability_x)
#> Error in `purrr::map()`:
#> ℹ In index: 2.
#> Caused by error in `estimate_str[[.x]]`:
#> ! subscript out of bounds
#> Backtrace:
#>      ▆
#>   1. ├─probably::cal_plot_breaks(test_df, Label, probability_x)
#>   2. ├─probably:::cal_plot_breaks.data.frame(test_df, Label, probability_x)
#>   3. │ └─probably:::cal_plot_breaks_impl(...)
#>   4. │   ├─probably::.cal_table_breaks(...)
#>   5. │   └─probably:::.cal_table_breaks.data.frame(...)
#>   6. │     └─probably:::.cal_table_breaks_impl(...)
#>   7. │       └─probably:::truth_estimate_map(...)
#>   8. │         └─purrr::map(seq_along(truth_levels), ~sym(estimate_str[[.x]]))
#>   9. │           └─purrr:::map_("list", .x, .f, ..., .progress = .progress)
#>  10. │             ├─purrr:::with_indexed_errors(...)
#>  11. │             │ └─base::withCallingHandlers(...)
#>  12. │             ├─purrr:::call_with_cleanup(...)
#>  13. │             └─probably (local) .f(.x[[i]], ...)
#>  14. │               └─rlang::sym(estimate_str[[.x]])
#>  15. │                 └─rlang::is_symbol(x)
#>  16. └─purrr (local) `<fn>`(`<sbscOOBE>`)
#>  17.   └─cli::cli_abort(...)
#>  18.     └─rlang::abort(...)

^{Created on 2023-07-06 with reprex v2.0.2}

But the same code works if we change the column name to .pred_x:

library(tidyverse)
library(probably)
#> 
#> Attaching package: 'probably'
#> The following objects are masked from 'package:base':
#> 
#>     as.factor, as.ordered

set.seed(100)
test_df <- tibble(
  .pred_x = runif(100),
  Label = as.factor(case_when(.pred_x > 0.5 ~ "x", TRUE ~ "y"))
)

cal_plot_breaks(test_df, Label, .pred_x)

^{Created on 2023-07-06 with reprex v2.0.2}

I see that the docs say:

A vector of column identifiers, or one of dplyr selector functions to choose which variables contains the class probabilities. It defaults to the prefix used by tidymodels (.pred_). The order of the identifiers will be considered the same as the order of the levels of the truth variable.

But it doesn't seem clear that they have to be .pred_x and similar.

The text was updated successfully, but these errors were encountered:

sworland-thyme · 2023-07-17T17:21:43Z

I also ran into this issue and it appears that the columns are identified using either the defaults (.pred_x) or the position. A more informative error message such as cli::cli_abort("{.arg truth} please use the .pred_x naming convention.") would help but that doesn't seem sufficient for a PR, so for now I just changed my code (the column was named prob).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with names of columns like `probability_*` #121

Problem with names of columns like `probability_*` #121

juliasilge commented Jul 6, 2023 •

edited

Loading

sworland-thyme commented Jul 17, 2023

Problem with names of columns like probability_* #121

Problem with names of columns like probability_* #121

Comments

juliasilge commented Jul 6, 2023 • edited Loading

sworland-thyme commented Jul 17, 2023

Problem with names of columns like `probability_*` #121

Problem with names of columns like `probability_*` #121

juliasilge commented Jul 6, 2023 •

edited

Loading