Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ignore_step function #1324

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -582,6 +582,7 @@ export(get_case_weights)
export(get_keep_original_cols)
export(has_role)
export(has_type)
export(ignore_step)
export(imp_vars)
export(importance_weights)
export(is_trained)
Expand Down
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@

* `step_mutate()` gained `.pkgs` argument to specify what packages need to be loaded for step to work. (#1282)

* Added `ignore_step()` to modify untrained recipes by removing steps from them. (#887)

* Added more documentation in `?selections` about how `tidyselect::everything()` works in recipes. (#1259)

* Improved error message for misspelled argument in step functions. (#1318)
Expand Down
71 changes: 71 additions & 0 deletions R/ignore_step.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
#' Remove steps from recipe
#'
#' `ignore_step` will return a recipe without steps specified by the `number` or
#' `id` argument.
#'
#' @param x A `recipe` object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' @param x A `recipe` object.
#' @param x An untrained `recipe` object.

#' @param number An integer vector, Denoting the positions of the steps that
#' should be removed.
#' @param id A character string. Denoting the `id` of the steps that should be
#' removed.
Comment on lines +7 to +10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' @param number An integer vector, Denoting the positions of the steps that
#' should be removed.
#' @param id A character string. Denoting the `id` of the steps that should be
#' removed.
#' @param number An integer vector denoting the positions of the steps that
#' should be removed.
#' @param id A character string denoting the `id` of the steps that should be
#' removed.

#'
#' @details
#' `number` or `id` must be specified. Specifying neither or both will result
#' in a error.
#'
#' @return a `recipe` object.
#'
#' @examples
#' rec <- recipe(mpg ~ ., data = mtcars) %>%
#' step_dummy(all_nominal_predictors()) %>%
#' step_impute_mean(all_numeric_predictors()) %>%
#' step_normalize(all_numeric_predictors()) %>%
#' step_pca(all_numeric_predictors(), id = "PCA")
#'
#' ignore_step(rec, number = 1)
#'
#' ignore_step(rec, number = 1:2)
#'
#' ignore_step(rec, id = "PCA")
#' @export
ignore_step <- function(x, number, id) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would give number and id some default values since only one of the two is required. tidy() uses NA.

if (any(map_lgl(x$steps, is_trained))) {
cli::cli_abort(
"{.arg x} must not contain any trained steps."
)
}

n_steps <- length(x$steps)

if (n_steps == 0) {
cli::cli_abort("{.arg x} doesn't contain any steps to remove.")
}

arg <- rlang::check_exclusive(number, id)
Comment on lines +40 to +44
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could allow ignore_step(<recipes without steps>) and just have it do nothing. If we don't want that, I would swap these two checks. To me, it's more obvious then that we don't allow that because we always require you to provide what to remove (before we tell you that there is nothing to remove).


if (arg == "number") {
if (any(number < 1 | number > n_steps)) {
offenders <- number[number < 1 | number > n_steps]
cli::cli_abort(
"{.arg number} must only contain values between 1 and {n_steps}. \\
Not {offenders}."
)
}
} else {
step_ids <- vapply(x$steps, function(x) x$id, character(1))
if (!(id %in% step_ids)) {
cli::cli_abort(
"Supplied {.arg id} ({.val {id}}) not found in the recipe."
)
}
number <- which(id == step_ids)
}

x$steps <- x$steps[-number]

if (length(x$steps) == 0) {
x["steps"] <- list(NULL)
}
hfrick marked this conversation as resolved.
Show resolved Hide resolved

x
}
1 change: 1 addition & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ reference:
- update_role_requirements
- get_case_weights
- case_weights
- ignore_step
- title: Step Functions - Imputation
contents:
- starts_with("step_impute_")
Expand Down
41 changes: 41 additions & 0 deletions man/ignore_step.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

64 changes: 64 additions & 0 deletions tests/testthat/_snaps/ignore_step.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# ignore_step() errors when needed

Code
ignore_step(rec)
Condition
Error in `ignore_step()`:
! `x` doesn't contain any steps to remove.

---

Code
ignore_step(rec1234)
Condition
Error in `ignore_step()`:
! One of `number` or `id` must be supplied.

---

Code
ignore_step(rec1234, number = 1, id = "pca")
Condition
Error in `ignore_step()`:
! Exactly one of `number` or `id` must be supplied.

---

Code
ignore_step(rec1234, number = 0)
Condition
Error in `ignore_step()`:
! `number` must only contain values between 1 and 4. Not 0.

---

Code
ignore_step(rec1234, number = 10)
Condition
Error in `ignore_step()`:
! `number` must only contain values between 1 and 4. Not 10.

---

Code
ignore_step(rec1234, id = "no id")
Condition
Error in `ignore_step()`:
! Supplied `id` ("no id") not found in the recipe.

---

Code
ignore_step(rec12)
Condition
Error in `ignore_step()`:
! `x` must not contain any trained steps.

---

Code
ignore_step(rec1234)
Condition
Error in `ignore_step()`:
! `x` must not contain any trained steps.

108 changes: 108 additions & 0 deletions tests/testthat/test-ignore_step.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
test_that("ignore_step() work correctly", {
rec <- recipe(mpg ~ ., data = mtcars)

rec1234 <- recipe(mpg ~ ., data = mtcars) %>%
step_dummy(all_nominal_predictors(), id = "dummy") %>%
step_impute_mean(all_numeric_predictors(), id = "impute_mean") %>%
step_normalize(all_numeric_predictors(), id = "normalize") %>%
step_pca(all_numeric_predictors(), id = "pca")

rec234 <- recipe(mpg ~ ., data = mtcars) %>%
step_impute_mean(all_numeric_predictors(), id = "impute_mean") %>%
step_normalize(all_numeric_predictors(), id = "normalize") %>%
step_pca(all_numeric_predictors(), id = "pca")

rec34 <- recipe(mpg ~ ., data = mtcars) %>%
step_normalize(all_numeric_predictors(), id = "normalize") %>%
step_pca(all_numeric_predictors(), id = "pca")

rec123 <- recipe(mpg ~ ., data = mtcars) %>%
step_dummy(all_nominal_predictors(), id = "dummy") %>%
step_impute_mean(all_numeric_predictors(), id = "impute_mean") %>%
step_normalize(all_numeric_predictors(), id = "normalize")

expect_equal(
ignore_attr = TRUE,
ignore_step(rec1234, number = 1),
rec234
)
Comment on lines +24 to +28
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised by the argument order here, why do you put ignore_attr up front?


expect_equal(
ignore_attr = TRUE,
ignore_step(rec1234, number = 1:2),
rec34
)

expect_equal(
ignore_attr = TRUE,
ignore_step(rec1234, number = 1:4),
rec
)

expect_equal(
ignore_attr = TRUE,
ignore_step(rec1234, number = 1),
rec234
)

expect_equal(
ignore_attr = TRUE,
ignore_step(rec1234, id = "pca"),
rec123
)
})

test_that("ignore_step() errors when needed", {
rec <- recipe(mpg ~ ., data = mtcars)

rec1234 <- recipe(mpg ~ ., data = mtcars) %>%
step_dummy(all_nominal_predictors(), id = "dummy") %>%
step_impute_mean(all_numeric_predictors(), id = "impute_mean") %>%
step_normalize(all_numeric_predictors(), id = "normalize") %>%
step_pca(all_numeric_predictors(), id = "pca")

expect_snapshot(
error = TRUE,
ignore_step(rec)
)
expect_snapshot(
error = TRUE,
ignore_step(rec1234)
)
expect_snapshot(
error = TRUE,
ignore_step(rec1234, number = 1, id = "pca")
)
expect_snapshot(
error = TRUE,
ignore_step(rec1234, number = 0)
)
expect_snapshot(
error = TRUE,
ignore_step(rec1234, number = 10)
)
expect_snapshot(
error = TRUE,
ignore_step(rec1234, id = "no id")
)
})

test_that("ignore_step() errors when needed", {
rec12 <- recipe(mpg ~ ., data = mtcars) %>%
step_dummy(all_nominal_predictors(), id = "dummy") %>%
step_impute_mean(all_numeric_predictors(), id = "impute_mean") %>%
prep()

rec1234 <- rec12 %>%
step_normalize(all_numeric_predictors(), id = "normalize") %>%
step_pca(all_numeric_predictors(), id = "pca")

expect_snapshot(
error = TRUE,
ignore_step(rec12)
)
expect_snapshot(
error = TRUE,
ignore_step(rec1234)
)
})
Loading