From 8315ed8818666307c4497fd3d988c82ef82d8e51 Mon Sep 17 00:00:00 2001 From: Julia Silge Date: Wed, 27 Sep 2023 09:49:04 -0600 Subject: [PATCH] Refresh README and vignettes (#267) * Refresh README and vignettes * Try different setup-r-deps for R-devel * Revert "Try different setup-r-deps for R-devel" This reverts commit d53f3d52699e024626c0e51f6dd3cf4d41bcea6e. * Set Bioconductor version * Only set env var for R-devel * Remove extraneous namespacing --- .github/workflows/R-CMD-check.yaml | 5 + README.Rmd | 54 +++++----- README.md | 132 ++++++++++++------------- vignettes/adding-models-to-butcher.Rmd | 28 +++--- vignettes/available-axe-methods.Rmd | 2 +- vignettes/butcher.Rmd | 90 +++++++---------- 6 files changed, 149 insertions(+), 162 deletions(-) diff --git a/.github/workflows/R-CMD-check.yaml b/.github/workflows/R-CMD-check.yaml index 67bb508a..3aa57ae5 100644 --- a/.github/workflows/R-CMD-check.yaml +++ b/.github/workflows/R-CMD-check.yaml @@ -54,6 +54,11 @@ jobs: http-user-agent: ${{ matrix.config.http-user-agent }} use-public-rspm: true + - name: Set bioc env var for R-devel + if: ${{ matrix.config.r == 'devel'}} + run: | + echo "R_BIOC_VERSION=3.17" >> $GITHUB_ENV + - uses: r-lib/actions/setup-r-dependencies@v2 with: extra-packages: diff --git a/README.Rmd b/README.Rmd index 2941211d..353af5f3 100644 --- a/README.Rmd +++ b/README.Rmd @@ -24,12 +24,12 @@ knitr::opts_chunk$set( ## Overview -Modeling pipelines in `R` occasionally result in fitted model objects that take up too much memory. There are two main culprits: +Modeling or machine learning in R can result in fitted model objects that take up too much memory. There are two main culprits: -1. Heavy dependencies on formulas and closures that capture the enclosing environment in the modeling process; and -2. Lack of selectivity in the construction of the model object itself. +1. Heavy usage of formulas and closures that capture the enclosing environment in model training +2. Lack of selectivity in the construction of the model object itself -As a result, fitted model objects carry over components that are often redundant and not required for post-fit estimation activities. `butcher` makes it easy to axe parts of the fitted output that are no longer needed, without sacrificing much functionality from the original model object. +As a result, fitted model objects contain components that are often redundant and not required for post-fit estimation activities. The butcher package provides tooling to "axe" parts of the fitted output that are no longer needed, without sacrificing prediction functionality from the original model object. ## Installation @@ -48,15 +48,7 @@ pak::pak("tidymodels/butcher") ## Butchering -To make the most of your memory available, this package provides five S3 generics for you to remove parts of a model object: - -- `axe_call()`: To remove the call object. -- `axe_ctrl()`: To remove controls associated with training. -- `axe_data()`: To remove the original training data. -- `axe_env()`: To remove environments. -- `axe_fitted()`: To remove fitted values. - -As an example, we wrap a `lm` model: +As an example, let's wrap an `lm` model so it contains a lot of unnecessary stuff: ```{r example} library(butcher) @@ -66,14 +58,14 @@ our_model <- function() { } ``` -The `lm` that exists in our modeling pipeline is: +This object is unnecessarily large: ```{r} library(lobstr) obj_size(our_model()) ``` -When, in fact, it should only require: +When, in fact, it should only be: ```{r} small_lm <- lm(mpg ~ ., data = mtcars) @@ -84,37 +76,45 @@ To understand which part of our original model object is taking up the most memo ```{r} big_lm <- our_model() -butcher::weigh(big_lm) +weigh(big_lm) ``` -The problem here is in the `terms` component of our `big_lm`. Because of how `lm` is implemented in the `stats` package, the environment (in which our model was made) was also carried along in the fitted output. To remove this (mostly) extraneous component, we can use `axe_env()`: +The problem here is in the `terms` component of our `big_lm`. Because of how `lm()` is implemented in the `stats` package, the environment in which our model was made is carried along in the fitted output. To remove the (mostly) extraneous component, we can use `butcher()`: ```{r} -cleaned_lm <- butcher::axe_env(big_lm, verbose = TRUE) +cleaned_lm <- butcher(big_lm, verbose = TRUE) ``` -Comparing it against our `small_lm`, we'll find: +Comparing it against our `small_lm`, we find: ```{r} -butcher::weigh(cleaned_lm) +weigh(cleaned_lm) ``` -...it now takes the same memory on disk as `small_lm`: +And now it will take up about the same memory on disk as `small_lm`: ```{r} -butcher::weigh(small_lm) +weigh(small_lm) ``` -Axing the environment is not the only functionality of `butcher`. We can also remove `call`, `ctrl`, `data` and `fitted_values`, or simply run `butcher()` to execute all of these axing functions at once. Any kind of axing on the object will append a butchered class to the current model object class(es) as well as a new attribute named `butcher_disabled` that lists any post-fit estimation functions that are disabled as a result. +To make the most of your memory available, this package provides five S3 generics for you to remove parts of a model object: + +- `axe_call()`: To remove the call object. +- `axe_ctrl()`: To remove controls associated with training. +- `axe_data()`: To remove the original training data. +- `axe_env()`: To remove environments. +- `axe_fitted()`: To remove fitted values. + +When you run `butcher()`, you execute all of these axing functions at once. Any kind of axing on the object will append a butchered class to the current model object class(es) as well as a new attribute named `butcher_disabled` that lists any post-fit estimation functions that are disabled as a result. ## Model Object Coverage Check out the `vignette("available-axe-methods")` to see butcher's current coverage. If you are working with a new model object that could benefit from any kind of axing, we would love for you to make a pull request! You can visit the `vignette("adding-models-to-butcher")` for more guidelines, but in short, to contribute a set of axe methods: -1) Run `new_model_butcher(model_class = "your_object", package_name = "your_package")` -2) Use butcher helper functions `butcher::weigh()` and `butcher::locate()` to decide what to axe -3) Finalize edits to `R/your_object.R` and `tests/testthat/test-your_object.R` -4) Make a pull request! +1. Run `new_model_butcher(model_class = "your_object", package_name = "your_package")` +2. Use butcher helper functions `weigh()` and `locate()` to decide what to axe +3. Finalize edits to `R/your_object.R` and `tests/testthat/test-your_object.R` +4. Make a pull request! ## Contributing diff --git a/README.md b/README.md index c1b7f2ba..f0fa8785 100644 --- a/README.md +++ b/README.md @@ -16,18 +16,18 @@ stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https:// ## Overview -Modeling pipelines in `R` occasionally result in fitted model objects +Modeling or machine learning in R can result in fitted model objects that take up too much memory. There are two main culprits: -1. Heavy dependencies on formulas and closures that capture the - enclosing environment in the modeling process; and -2. Lack of selectivity in the construction of the model object itself. +1. Heavy usage of formulas and closures that capture the enclosing + environment in model training +2. Lack of selectivity in the construction of the model object itself -As a result, fitted model objects carry over components that are often -redundant and not required for post-fit estimation activities. `butcher` -makes it easy to axe parts of the fitted output that are no longer -needed, without sacrificing much functionality from the original model -object. +As a result, fitted model objects contain components that are often +redundant and not required for post-fit estimation activities. The +butcher package provides tooling to “axe” parts of the fitted output +that are no longer needed, without sacrificing prediction functionality +from the original model object. ## Installation @@ -46,16 +46,8 @@ pak::pak("tidymodels/butcher") ## Butchering -To make the most of your memory available, this package provides five S3 -generics for you to remove parts of a model object: - -- `axe_call()`: To remove the call object. -- `axe_ctrl()`: To remove controls associated with training. -- `axe_data()`: To remove the original training data. -- `axe_env()`: To remove environments. -- `axe_fitted()`: To remove fitted values. - -As an example, we wrap a `lm` model: +As an example, let’s wrap an `lm` model so it contains a lot of +unnecessary stuff: ``` r library(butcher) @@ -65,7 +57,7 @@ our_model <- function() { } ``` -The `lm` that exists in our modeling pipeline is: +This object is unnecessarily large: ``` r library(lobstr) @@ -73,7 +65,7 @@ obj_size(our_model()) #> 8.02 MB ``` -When, in fact, it should only require: +When, in fact, it should only be: ``` r small_lm <- lm(mpg ~ ., data = mtcars) @@ -86,7 +78,7 @@ most memory, we leverage the `weigh()` function: ``` r big_lm <- our_model() -butcher::weigh(big_lm) +weigh(big_lm) #> # A tibble: 25 × 2 #> object size #> @@ -104,39 +96,40 @@ butcher::weigh(big_lm) ``` The problem here is in the `terms` component of our `big_lm`. Because of -how `lm` is implemented in the `stats` package, the environment (in -which our model was made) was also carried along in the fitted output. -To remove this (mostly) extraneous component, we can use `axe_env()`: +how `lm()` is implemented in the `stats` package, the environment in +which our model was made is carried along in the fitted output. To +remove the (mostly) extraneous component, we can use `butcher()`: ``` r -cleaned_lm <- butcher::axe_env(big_lm, verbose = TRUE) +cleaned_lm <- butcher(big_lm, verbose = TRUE) #> ✔ Memory released: 8.03 MB +#> ✖ Disabled: `print()`, `summary()`, and `fitted()` ``` -Comparing it against our `small_lm`, we’ll find: +Comparing it against our `small_lm`, we find: ``` r -butcher::weigh(cleaned_lm) +weigh(cleaned_lm) #> # A tibble: 25 × 2 -#> object size -#> -#> 1 terms 0.00771 -#> 2 qr.qr 0.00666 -#> 3 residuals 0.00286 -#> 4 fitted.values 0.00286 -#> 5 effects 0.0014 -#> 6 coefficients 0.00109 -#> 7 call 0.000728 -#> 8 model.mpg 0.000304 -#> 9 model.cyl 0.000304 -#> 10 model.disp 0.000304 +#> object size +#> +#> 1 terms 0.00771 +#> 2 qr.qr 0.00666 +#> 3 residuals 0.00286 +#> 4 effects 0.0014 +#> 5 coefficients 0.00109 +#> 6 model.mpg 0.000304 +#> 7 model.cyl 0.000304 +#> 8 model.disp 0.000304 +#> 9 model.hp 0.000304 +#> 10 model.drat 0.000304 #> # ℹ 15 more rows ``` -…it now takes the same memory on disk as `small_lm`: +And now it will take up about the same memory on disk as `small_lm`: ``` r -butcher::weigh(small_lm) +weigh(small_lm) #> # A tibble: 25 × 2 #> object size #> @@ -153,13 +146,20 @@ butcher::weigh(small_lm) #> # ℹ 15 more rows ``` -Axing the environment is not the only functionality of `butcher`. We can -also remove `call`, `ctrl`, `data` and `fitted_values`, or simply run -`butcher()` to execute all of these axing functions at once. Any kind of -axing on the object will append a butchered class to the current model -object class(es) as well as a new attribute named `butcher_disabled` -that lists any post-fit estimation functions that are disabled as a -result. +To make the most of your memory available, this package provides five S3 +generics for you to remove parts of a model object: + +- `axe_call()`: To remove the call object. +- `axe_ctrl()`: To remove controls associated with training. +- `axe_data()`: To remove the original training data. +- `axe_env()`: To remove environments. +- `axe_fitted()`: To remove fitted values. + +When you run `butcher()`, you execute all of these axing functions at +once. Any kind of axing on the object will append a butchered class to +the current model object class(es) as well as a new attribute named +`butcher_disabled` that lists any post-fit estimation functions that are +disabled as a result. ## Model Object Coverage @@ -169,13 +169,13 @@ benefit from any kind of axing, we would love for you to make a pull request! You can visit the `vignette("adding-models-to-butcher")` for more guidelines, but in short, to contribute a set of axe methods: -1) Run +1. Run `new_model_butcher(model_class = "your_object", package_name = "your_package")` -2) Use butcher helper functions `butcher::weigh()` and - `butcher::locate()` to decide what to axe -3) Finalize edits to `R/your_object.R` and +2. Use butcher helper functions `weigh()` and `locate()` to decide what + to axe +3. Finalize edits to `R/your_object.R` and `tests/testthat/test-your_object.R` -4) Make a pull request! +4. Make a pull request! ## Contributing @@ -183,18 +183,18 @@ This project is released with a [Contributor Code of Conduct](https://contributor-covenant.org/version/2/0/CODE_OF_CONDUCT.html). By contributing to this project, you agree to abide by its terms. -- For questions and discussions about tidymodels packages, modeling, and - machine learning, please [post on RStudio - Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question). +- For questions and discussions about tidymodels packages, modeling, + and machine learning, please [post on RStudio + Community](https://community.rstudio.com/new-topic?category_id=15&tags=tidymodels,question). -- If you think you have encountered a bug, please [submit an - issue](https://github.com/tidymodels/butcher/issues). +- If you think you have encountered a bug, please [submit an + issue](https://github.com/tidymodels/butcher/issues). -- Either way, learn how to create and share a - [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) - (a minimal, reproducible example), to clearly communicate about your - code. +- Either way, learn how to create and share a + [reprex](https://reprex.tidyverse.org/articles/articles/learn-reprex.html) + (a minimal, reproducible example), to clearly communicate about your + code. -- Check out further details on [contributing guidelines for tidymodels - packages](https://www.tidymodels.org/contribute/) and [how to get - help](https://www.tidymodels.org/help/). +- Check out further details on [contributing guidelines for tidymodels + packages](https://www.tidymodels.org/contribute/) and [how to get + help](https://www.tidymodels.org/help/). diff --git a/vignettes/adding-models-to-butcher.Rmd b/vignettes/adding-models-to-butcher.Rmd index f0a5f4d5..a2bc262a 100644 --- a/vignettes/adding-models-to-butcher.Rmd +++ b/vignettes/adding-models-to-butcher.Rmd @@ -18,9 +18,9 @@ knitr::opts_chunk$set( library(butcher) ``` -If you come across any model objects that should be subject to butchering, but does not exist in our current repository as listed [here](https://www.tidymodels.org/find/parsnip/), please consider becoming a contributor to this package! For any first-timers, this is great place to start as we've created templates that make this process as seamless as possible. +If you come across any model objects that should be subject to butchering but does not exist in our current repository as listed [here](https://www.tidymodels.org/find/parsnip/), please consider becoming a contributor to this package! For any first time contributors, this is a great place to start as we've created templates that make this process as seamless as possible. -Let's say our new model object, of class `blob`, was generated from a `R` package called `blobber`. If you want to add axe methods for this class, first clone butcher onto your local computer and open up RStudio (see `usethis::create_from_github("tidymodels/butcher")` for an automated way to do this). After you have opened RStudio and are in the `butcher` RStudio Project, run: +Let's say our new model object, of class `blob`, was generated from an R package called "blobber". If you want to add axe methods for this class, first clone butcher onto your local computer and open up RStudio (use `usethis::create_from_github("tidymodels/butcher")` for an automated way to do this). After you have opened RStudio and are in the butcher RStudio project, run: ```{r, eval = FALSE} > new_model_butcher(model_class = "blob", package_name = "blobber") @@ -41,13 +41,13 @@ You'll get the following console messages: ● Modify 'tests/testthat/test-blob.R' ``` -`new_model_butcher()` leverages `usethis` to: +`new_model_butcher()` leverages usethis to: -1. Add the new `blobber` modeling package under `Suggests` in the `butcher` package description file. -2. Generate a skeleton file under the `R` directory with all possible axe methods for `blob`. -3. Generate an associated test file under `tests/testthat` to test new `blob` axe methods. +1. Add the new blobber modeling package under `Suggests` in the butcher package description file. +2. Generate a skeleton file under the `/R` directory with all possible axe methods for `blob`. +3. Generate an associated test file under `/tests/testthat` to test new `blob` axe methods. -As shown by the `R` scripts attached to other model objects that exist in this package, *not all* axe generics are used. In fact, if you take a look at the `elnet.R` script, the only component of the model object fit from the package `glmnet` that might be worth axing is the `call`. To help target what is worth removing from `blob`, we recommend first beginning with `butcher::weigh()` to identify which parts of the model object take up the most memory. +As you can see in the R scripts for other model objects in this package, *not all* axe generics are always used. In fact, if you take a look at the `elnet.R` script, the only component of the model object fit from the package `glmnet` that is worth axing is the `call`. To help target what is worth removing from `blob`, we recommend first beginning with `weigh()` to identify which parts of the model object take up the most memory. ```{r, eval = FALSE} > weigh(fitted_blob_object) @@ -67,9 +67,9 @@ As shown by the `R` scripts attached to other model objects that exist in this p # … with 15 more rows ``` -In this example, the fitted model objected generated from `blobber` has a `terms` component that is taking 4.01 Mb. From here, you can examine the structure of this terms component by leveraging `lobstr::sxp(fitted_blob_object$terms)` or simply running `utils::str(fitted_blob_object$terms)`. If you are looking to hunt for a specific component like the environment, fitted values, training data, controls or the call object, take a look at `butcher::locate()`. +In this example, the fitted model objected generated from blobber has a `terms` component that is taking 4.01 Mb. From here, you can examine the structure of this terms component by leveraging `lobstr::sxp(fitted_blob_object$terms)` or simply running `utils::str(fitted_blob_object$terms)`. If you are looking to hunt for a specific component like the environment, fitted values, training data, controls, or the call object, take a look at `locate()`. -Perhaps from our model object, `blob`, we find that the `call` is the only piece worth axing (or replacing). The `R/blob.R` skeleton would be completed by putting a placeholder for the original call. +Perhaps for our `blob` model object, we find that the `call` is the only piece worth axing (replacing/removing). The `R/blob.R` skeleton would be completed by adding a placeholder for the original call. ```{r, eval = FALSE} #' Axing a blob. @@ -107,14 +107,14 @@ axe_call.blob <- function(x, verbose = TRUE, ...) { } ``` -Here we assign the current blob object `x` to the variable `old` as a means to evaluate the memory released once `axe_call()` is executed on the original model object. Next, we actually `exchange()` the current call with a dummy call of a (hopefully) smaller size. We also include `assess_object()` with the additional string parameter of `disabled` so console messages will be printed out, alerting users of any downstream functions would be affected by axing the call. Since the original model object is fundamentally different, we attach an additional `butcher_blob` class by calling `add_butcher_class()` at the end of each axe method. Once the axe methods are set, we then have a skeleton file `tests/testthat/test-blob.R` to aid in any unit testing. +Here we assign the current blob object `x` to the variable `old` as a means to evaluate the memory released once `axe_call()` is executed on the original model object. Next, we actually `exchange()` the current call with a dummy call of a (hopefully) smaller size. We also include `assess_object()` with the additional string parameter of `disabled` so console messages will be printed out, alerting users of any downstream functions that would be affected by axing the call. Since the original model object has different components than the new one, we add an additional `butcher_blob` class by calling `add_butcher_class()` at the end of each axe method. Once the axe methods are set, we then have a skeleton file `tests/testthat/test-blob.R` to aid in any unit testing. ## Recap Adding a new model object to butcher: -1) Run `new_model_butcher(model_class = "blob", package_name = "blobber")` -2) Use butcher helper functions `butcher::weigh()` and `butcher::locate()` to decide what to axe -3) Finalize edits to `R/blob.R` and `tests/testthat/test-blob.R` -4) Make a pull request! +1. Run `new_model_butcher(model_class = "blob", package_name = "blobber")` +2. Use butcher helper functions `weigh()` and `locate()` to decide what to axe +3. Finalize edits to `R/blob.R` and `tests/testthat/test-blob.R` +4. Make a pull request! diff --git a/vignettes/available-axe-methods.Rmd b/vignettes/available-axe-methods.Rmd index 58032ef7..2e2269f8 100644 --- a/vignettes/available-axe-methods.Rmd +++ b/vignettes/available-axe-methods.Rmd @@ -16,7 +16,7 @@ knitr::opts_chunk$set( ) ``` -The following axe methods are currently available in `butcher`: +The following axe methods are currently available in butcher: ```{r setup, echo = FALSE, warnings = FALSE, message = FALSE} suppressWarnings(library(butcher)) diff --git a/vignettes/butcher.Rmd b/vignettes/butcher.Rmd index daf99eb1..1742a720 100644 --- a/vignettes/butcher.Rmd +++ b/vignettes/butcher.Rmd @@ -20,103 +20,85 @@ library(butcher) library(parsnip) ``` -One of the beauties of working with `R` is the ease with which you can implement intricate models and make challenging data analysis pipelines seem almost trivial. Take, for example, the `parsnip` package; with the installation of a few associated libraries and a few lines of code, you can fit something as complex as a boosted tree: +One of the benefits of working in R is the ease with which you can implement complex models and implement challenging data analysis pipelines. Take, for example, the parsnip package; with the installation of a few associated libraries and a few lines of code, you can fit something as sophisticated as a boosted tree: -```{r, warning = F, message = F, eval = F} -library(rpart) - -fitted_model <- boost_tree(trees = 15) %>% - set_engine("C5.0") %>% - fit(as.factor(am) ~ disp + hp, data = mtcars) -``` - -Or, let’s say you’re working on petabytes of data, in which data are distributed across many nodes, just switch out the `parsnip` engine: - -```{r, warning = F, message = F, eval = F} -library(sparklyr) - -sc <- spark_connect(master = "local") - -mtcars_tbls <- sdf_copy_to(sc, mtcars[, c("am", "disp", "hp")]) - -fitted_model <- boost_tree(trees = 15) %>% - set_engine("spark") %>% - fit(am ~ disp + hp, data = mtcars_tbls) +```{r, eval = FALSE} +fitted_model <- boost_tree(mode = "regression") %>% + fit(mpg ~ ., data = mtcars) ``` -Yet, while our code may appear compact, the underlying fitted result may not be. Since `parsnip` works as a wrapper for many modeling packages, its fitted model objects inherit the same properties as those that arise from the original modeling package. A straightforward example is the popular `lm` function from the base `stats` package. Whether you leverage `parsnip` or not, you arrive at the same result: +Yet, while this code is compact, the underlying fitted result may not be. Since parsnip works as a wrapper for many modeling packages, its fitted model objects inherit the same properties as those that arise from the original modeling package. A straightforward example is the `lm()` function from the base `stats` package. Whether you leverage parsnip or not, you get the same result: -```{r, warning = F, message = F} +```{r} parsnip_lm <- linear_reg() %>% - set_engine("lm") %>% fit(mpg ~ ., data = mtcars) parsnip_lm ``` -Using just `lm`: +Using just `lm()`: -```{r, warning = F, message = F} +```{r} old_lm <- lm(mpg ~ ., data = mtcars) old_lm ``` -Let's say we take this familiar `old_lm` approach in building our in-house modeling pipeline. Such a pipeline might entail wrapping `lm()` in other function, but in doing so, we may end up carrying some junk. +Let's say we take this familiar `old_lm` approach in building a custom in-house modeling pipeline. Such a pipeline might entail wrapping `lm()` in other function, but in doing so, we may end up carrying around some unnecessary junk. -```{r, warning = F, message = F} +```{r} in_house_model <- function() { some_junk_in_the_environment <- runif(1e6) # we didn't know about lm(mpg ~ ., data = mtcars) } ``` -The linear model fit that exists in our pipeline is: +The linear model fit that exists in our custom modeling pipeline is then: -```{r, warning = F, message = F} +```{r} library(lobstr) obj_size(in_house_model()) ``` -When it is fundamentally the same as our `old_lm`, which only takes up: +But it is functionally the same as our `old_lm`, which only takes up: -```{r, warning = F, message = F} +```{r} obj_size(old_lm) ``` -Ideally, we want to avoid saving this new `in_house_model()` on disk, when we could have something like `old_lm` that takes up less memory. So, what the heck is going on here? We can examine possible issues with a fitted model object using the `butcher` package: +Ideally, we want to avoid saving this new `in_house_model()` on disk, when we could have something like `old_lm` that takes up less memory. But what the heck is going on here? We can examine possible issues with a fitted model object using the butcher package: -```{r, warning = F, message = F} +```{r} big_lm <- in_house_model() -butcher::weigh(big_lm, threshold = 0, units = "MB") +weigh(big_lm, threshold = 0, units = "MB") ``` -The problem here is in the `terms` component of `big_lm`. Because of how `lm` is implemented in the base `stats` package---relying on intermediate forms of the data from the `model.frame` and `model.matrix` output, the *environment* in which the linear fit was created *was carried along* in the model output. +The problem here is in the `terms` component of `big_lm`. Because of how `lm()` is implemented in the base `stats` package (relying on intermediate forms of the data from `model.frame` and `model.matrix`) the **environment** in which the linear fit was created is carried along in the model output. -We can see this with the `env_print` function from the `rlang` package: +We can see this with the `env_print()` function from the rlang package: -```{r, warning = F, message = F} +```{r} library(rlang) env_print(big_lm$terms) ``` -To avoid carrying possible junk in our production pipeline, whether it be associated with an `lm` model (or something more complex), we can leverage `axe_env()` within the `butcher` package. In other words, +To avoid carrying possible junk around in our production pipeline, whether it be associated with an `lm()` model (or something more complex), we can leverage `axe_env()` from the butcher package: -```{r, warning = F, message = F} -cleaned_lm <- butcher::axe_env(big_lm, verbose = TRUE) +```{r} +cleaned_lm <- axe_env(big_lm, verbose = TRUE) ``` Comparing it against our `old_lm`, we find: -```{r, warning = F, message = F} -butcher::weigh(cleaned_lm, threshold = 0, units = "MB") +```{r} +weigh(cleaned_lm, threshold = 0, units = "MB") ``` -...it now takes the same memory on disk: +And now it takes the same memory on disk: -```{r, warning = F, message = F} -butcher::weigh(old_lm, threshold = 0, units = "MB") +```{r} +weigh(old_lm, threshold = 0, units = "MB") ``` -Axing the environment, however, is not the only functionality of `butcher`. This package provides five S3 generics that include: +Axing the environment, however, is not the only functionality of butcher. This package provides five S3 generics that include: - `axe_call()`: Remove the call object. - `axe_ctrl()`: Remove the controls fixed for training. @@ -124,20 +106,20 @@ Axing the environment, however, is not the only functionality of `butcher`. This - `axe_env()`: Replace inherited environments with empty environments. - `axe_fitted()`: Remove fitted values. -In our case here with `lm`, if we are only interested in prediction as the end product of our modeling pipeline, we could free up a lot of memory if we execute all the possible axe functions at once. To do so, we simply run `butcher()`: +In our case here with `lm()`, if we are only interested in prediction as the end product of our modeling pipeline, we could free up a lot of memory if we execute all the possible axe functions at once. To do so, we simply run `butcher()`: -```{r, warning = F, message = F} -butchered_lm <- butcher::butcher(big_lm) +```{r} +butchered_lm <- butcher(big_lm) predict(butchered_lm, mtcars[, 2:11]) ``` Alternatively, we can pick and choose specific axe functions, removing only those parts of the model object that we are no longer interested in characterizing. -```{r, warning = F, message = F} +```{r} butchered_lm <- big_lm %>% - butcher::axe_env() %>% - butcher::axe_fitted() + axe_env() %>% + axe_fitted() predict(butchered_lm, mtcars[, 2:11]) ``` -`butcher` makes it easy to axe parts of the fitted output that are no longer needed, without sacrificing much functionality from the original model object. +The butcher package provides tooling to axe parts of the fitted output that are no longer needed, without sacrificing much functionality from the original model object.