Skip to content

Commit

Permalink
Remove extraneous namespacing
Browse files Browse the repository at this point in the history
  • Loading branch information
juliasilge committed Sep 27, 2023
1 parent 1fa33b0 commit 67b5ff7
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 21 deletions.
10 changes: 5 additions & 5 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -76,25 +76,25 @@ To understand which part of our original model object is taking up the most memo

```{r}
big_lm <- our_model()
butcher::weigh(big_lm)
weigh(big_lm)
```

The problem here is in the `terms` component of our `big_lm`. Because of how `lm()` is implemented in the `stats` package, the environment in which our model was made is carried along in the fitted output. To remove the (mostly) extraneous component, we can use `butcher()`:

```{r}
cleaned_lm <- butcher::butcher(big_lm, verbose = TRUE)
cleaned_lm <- butcher(big_lm, verbose = TRUE)
```

Comparing it against our `small_lm`, we find:

```{r}
butcher::weigh(cleaned_lm)
weigh(cleaned_lm)
```

And now it will take up about the same memory on disk as `small_lm`:

```{r}
butcher::weigh(small_lm)
weigh(small_lm)
```

To make the most of your memory available, this package provides five S3 generics for you to remove parts of a model object:
Expand All @@ -112,7 +112,7 @@ When you run `butcher()`, you execute all of these axing functions at once. Any
Check out the `vignette("available-axe-methods")` to see butcher's current coverage. If you are working with a new model object that could benefit from any kind of axing, we would love for you to make a pull request! You can visit the `vignette("adding-models-to-butcher")` for more guidelines, but in short, to contribute a set of axe methods:

1. Run `new_model_butcher(model_class = "your_object", package_name = "your_package")`
2. Use butcher helper functions `butcher::weigh()` and `butcher::locate()` to decide what to axe
2. Use butcher helper functions `weigh()` and `locate()` to decide what to axe
3. Finalize edits to `R/your_object.R` and `tests/testthat/test-your_object.R`
4. Make a pull request!

Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ most memory, we leverage the `weigh()` function:

``` r
big_lm <- our_model()
butcher::weigh(big_lm)
weigh(big_lm)
#> # A tibble: 25 × 2
#> object size
#> <chr> <dbl>
Expand All @@ -101,15 +101,15 @@ which our model was made is carried along in the fitted output. To
remove the (mostly) extraneous component, we can use `butcher()`:

``` r
cleaned_lm <- butcher::butcher(big_lm, verbose = TRUE)
cleaned_lm <- butcher(big_lm, verbose = TRUE)
#> ✔ Memory released: 8.03 MB
#> ✖ Disabled: `print()`, `summary()`, and `fitted()`
```

Comparing it against our `small_lm`, we find:

``` r
butcher::weigh(cleaned_lm)
weigh(cleaned_lm)
#> # A tibble: 25 × 2
#> object size
#> <chr> <dbl>
Expand All @@ -129,7 +129,7 @@ butcher::weigh(cleaned_lm)
And now it will take up about the same memory on disk as `small_lm`:

``` r
butcher::weigh(small_lm)
weigh(small_lm)
#> # A tibble: 25 × 2
#> object size
#> <chr> <dbl>
Expand Down Expand Up @@ -171,8 +171,8 @@ more guidelines, but in short, to contribute a set of axe methods:

1. Run
`new_model_butcher(model_class = "your_object", package_name = "your_package")`
2. Use butcher helper functions `butcher::weigh()` and
`butcher::locate()` to decide what to axe
2. Use butcher helper functions `weigh()` and `locate()` to decide what
to axe
3. Finalize edits to `R/your_object.R` and
`tests/testthat/test-your_object.R`
4. Make a pull request!
Expand Down
6 changes: 3 additions & 3 deletions vignettes/adding-models-to-butcher.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ You'll get the following console messages:
2. Generate a skeleton file under the `/R` directory with all possible axe methods for `blob`.
3. Generate an associated test file under `/tests/testthat` to test new `blob` axe methods.

As you can see in the R scripts for other model objects that exist in this package, *not all* axe generics are always used. In fact, if you take a look at the `elnet.R` script, the only component of the model object fit from the package `glmnet` that is worth axing is the `call`. To help target what is worth removing from `blob`, we recommend first beginning with `butcher::weigh()` to identify which parts of the model object take up the most memory.
As you can see in the R scripts for other model objects in this package, *not all* axe generics are always used. In fact, if you take a look at the `elnet.R` script, the only component of the model object fit from the package `glmnet` that is worth axing is the `call`. To help target what is worth removing from `blob`, we recommend first beginning with `weigh()` to identify which parts of the model object take up the most memory.

```{r, eval = FALSE}
> weigh(fitted_blob_object)
Expand All @@ -67,7 +67,7 @@ As you can see in the R scripts for other model objects that exist in this packa
# … with 15 more rows
```

In this example, the fitted model objected generated from blobber has a `terms` component that is taking 4.01 Mb. From here, you can examine the structure of this terms component by leveraging `lobstr::sxp(fitted_blob_object$terms)` or simply running `utils::str(fitted_blob_object$terms)`. If you are looking to hunt for a specific component like the environment, fitted values, training data, controls, or the call object, take a look at `butcher::locate()`.
In this example, the fitted model objected generated from blobber has a `terms` component that is taking 4.01 Mb. From here, you can examine the structure of this terms component by leveraging `lobstr::sxp(fitted_blob_object$terms)` or simply running `utils::str(fitted_blob_object$terms)`. If you are looking to hunt for a specific component like the environment, fitted values, training data, controls, or the call object, take a look at `locate()`.

Perhaps for our `blob` model object, we find that the `call` is the only piece worth axing (replacing/removing). The `R/blob.R` skeleton would be completed by adding a placeholder for the original call.

Expand Down Expand Up @@ -114,7 +114,7 @@ Here we assign the current blob object `x` to the variable `old` as a means to e
Adding a new model object to butcher:

1. Run `new_model_butcher(model_class = "blob", package_name = "blobber")`
2. Use butcher helper functions `butcher::weigh()` and `butcher::locate()` to decide what to axe
2. Use butcher helper functions `weigh()` and `locate()` to decide what to axe
3. Finalize edits to `R/blob.R` and `tests/testthat/test-blob.R`
4. Make a pull request!

14 changes: 7 additions & 7 deletions vignettes/butcher.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Ideally, we want to avoid saving this new `in_house_model()` on disk, when we co

```{r}
big_lm <- in_house_model()
butcher::weigh(big_lm, threshold = 0, units = "MB")
weigh(big_lm, threshold = 0, units = "MB")
```

The problem here is in the `terms` component of `big_lm`. Because of how `lm()` is implemented in the base `stats` package (relying on intermediate forms of the data from `model.frame` and `model.matrix`) the **environment** in which the linear fit was created is carried along in the model output.
Expand All @@ -83,19 +83,19 @@ env_print(big_lm$terms)
To avoid carrying possible junk around in our production pipeline, whether it be associated with an `lm()` model (or something more complex), we can leverage `axe_env()` from the butcher package:

```{r}
cleaned_lm <- butcher::axe_env(big_lm, verbose = TRUE)
cleaned_lm <- axe_env(big_lm, verbose = TRUE)
```

Comparing it against our `old_lm`, we find:

```{r}
butcher::weigh(cleaned_lm, threshold = 0, units = "MB")
weigh(cleaned_lm, threshold = 0, units = "MB")
```

And now it takes the same memory on disk:

```{r}
butcher::weigh(old_lm, threshold = 0, units = "MB")
weigh(old_lm, threshold = 0, units = "MB")
```

Axing the environment, however, is not the only functionality of butcher. This package provides five S3 generics that include:
Expand All @@ -109,16 +109,16 @@ Axing the environment, however, is not the only functionality of butcher. This p
In our case here with `lm()`, if we are only interested in prediction as the end product of our modeling pipeline, we could free up a lot of memory if we execute all the possible axe functions at once. To do so, we simply run `butcher()`:

```{r}
butchered_lm <- butcher::butcher(big_lm)
butchered_lm <- butcher(big_lm)
predict(butchered_lm, mtcars[, 2:11])
```

Alternatively, we can pick and choose specific axe functions, removing only those parts of the model object that we are no longer interested in characterizing.

```{r}
butchered_lm <- big_lm %>%
butcher::axe_env() %>%
butcher::axe_fitted()
axe_env() %>%
axe_fitted()
predict(butchered_lm, mtcars[, 2:11])
```

Expand Down

0 comments on commit 67b5ff7

Please sign in to comment.