You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Where we have class imbalance, one way of addressing it is to use importance weights. Not all parsnip models support them, but many do. However, when calibrating predictions using {probably} there's no option to use importance weights, which means the prediction probabilities shift back towards the majority class. I know {betacal} is the underlying model for Beta calibration and doesn't take weights, but it should be possible for the other models.
Here's an example: illustrating the problem:
library(tidyverse)
library(tidymodels)
set.seed(100)
# Create imbalanced data, add weight column giving roughly equal overall weight
df <- caret::twoClassSim(n=1000, intercept = -12) |>
dplyr::mutate(weights = dplyr::if_else(Class == "Class1", 0.15, 0.85)) |>
dplyr::mutate(weights = hardhat::importance_weights(weights),
Class = case_match(Class, "Class1" ~ "Majority",
"Class2" ~ "Minority"))
# Create recipe and logistic regression specification
glm_recipe <- recipes::recipe(x = df, formula = Class ~ .)
glm_spec <- parsnip::logistic_reg(mode = "classification",
engine = "glm")
# Combine into workflow
glm_wf <- workflows::workflow(preprocessor = glm_recipe,
spec = glm_spec) |>
workflows::add_case_weights(col = weights)
# Create resamples for model fitting
resamples <- rsample::vfold_cv(data = df,
v = 5,
strata = Class)
# Fit the model
wf_fit <- tune::fit_resamples(object = glm_wf,
resamples = resamples,
control = control_resamples(save_pred = TRUE,
save_workflow = TRUE))
# Collect the predictions
predictions <- tune::collect_predictions(wf_fit)
# Visualise the predictions
predictions |> pull(.pred_Minority) |> hist()
# Find median prediction
predictions |> summarise(median = median(.pred_Minority))
The median prediction pre-calibration is around 36%. We would expect this to change a bit with calibration.
# Now we want to calibrate the probabilities
cal <- probably::cal_estimate_logistic(.data = predictions, truth = Class)
predictions_calibrated <- probably::cal_apply(.data = predictions, object = cal)
# Visualise the predictions
predictions_calibrated |> pull(.pred_Minority) |> hist()
# Find median prediction
predictions_calibrated |> summarise(median = median(.pred_Minority))
The median prediction after calibration is around 10% which is much, much lower than pre-calibration.
The text was updated successfully, but these errors were encountered:
Feature
Where we have class imbalance, one way of addressing it is to use importance weights. Not all parsnip models support them, but many do. However, when calibrating predictions using {probably} there's no option to use importance weights, which means the prediction probabilities shift back towards the majority class. I know {betacal} is the underlying model for Beta calibration and doesn't take weights, but it should be possible for the other models.
Here's an example: illustrating the problem:
The median prediction pre-calibration is around 36%. We would expect this to change a bit with calibration.
The median prediction after calibration is around 10% which is much, much lower than pre-calibration.
The text was updated successfully, but these errors were encountered: