Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vfold_cv crashing/freezing the console #454

Closed
godscloset opened this issue Aug 28, 2023 · 12 comments
Closed

vfold_cv crashing/freezing the console #454

godscloset opened this issue Aug 28, 2023 · 12 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@godscloset
Copy link

Hello,

Not sure if this is specific to my machine, but I've been trying to use functions from the 'rsample' package to split up my data, but when I try to view or do anything with the object it creates, the console (and all of R) freezes without any error messages. Initially, the column names for the vfold object are shown but the object is empty. This has happened with both bootstrap() and vfold_cv() from what I've tested. Let me know if this is just me!

Best,
Jacob

library(tidymodels)
library(modeldata)
data(meats)

norm_rec <- 
  recipe(water + fat + protein ~ ., data = meats) %>%
  step_normalize(everything()) 

set.seed(57343)
folds <- vfold_cv(meats, repeats = 10)

folds <- 
  folds %>%
  mutate(recipes = map(splits, prepper, recipe = norm_rec))

## everything freezes up, requires R to be shut down
@EmilHvitfeldt EmilHvitfeldt added the bug an unexpected problem or unintended behavior label Aug 28, 2023
@EmilHvitfeldt
Copy link
Member

Hello @godscloset ! That is unfortunate, it should definitely work! Can you run the following code for me and show us the results? I wanna make sure you are using up to date versions of the packages

library(tidymodels)

sessionInfo()

@godscloset
Copy link
Author

godscloset commented Aug 28, 2023 via email

@EmilHvitfeldt
Copy link
Member

Hey! I think you forgot to attach the screenshot :)

@godscloset
Copy link
Author

godscloset commented Aug 28, 2023 via email

@godscloset
Copy link
Author

tidymodels_sessioninfo

Oh I see what happened. I tried to reply via email and not through github.

-J

@godscloset
Copy link
Author

Hi!
Another revelation: it seems like when I try to 'view' the created vfold object is when everything breaks down. Weirdly, this was working before but now causes the crash-- I don't know if this helps.
-Jacob

@EmilHvitfeldt
Copy link
Member

when you say view do you mean using the View() function, or having the object printed to the console?

View() doesn't handle non-standard data.frames very well and is not recommended to be used on tidymodels objects.

@cportner
Copy link

cportner commented Aug 29, 2023 via email

@EmilHvitfeldt
Copy link
Member

To be clear, this is not a tidymodels problem, but a RStudio IDE issue. RStudio is slow/crashes when trying to View() a data.frame with list columns rstudio/rstudio#2039.

We understand the frustration, which is why we generally discourage working with list-columns directly. The {tune} package has collect_*() functions that allow you to extract wanted information to a non-list-column-data.frame.

If you still want to see what happens with your data while working with list columns you can unselect them before using View()

folds %>%
  select(!where(is.list)) %>%
  View()

Another thing you could do is poke around with str(folds, max.level = 1) while slowly increasing max.level to avoid massive printing. I personally also enjoy using glimpse() for data.frames.

{recipes} objects themselves are not that great to look at for the user, as they contain quite a bit of information. If you were to investigate I would encourage the use of tidy() as an extracting function, and turn the data into a viewable format

library(tidymodels)
library(modeldata)
data(meats)

norm_rec <- 
  recipe(water + fat + protein ~ ., data = meats) %>%
  step_normalize(all_predictors()) 

set.seed(57343)
folds <- vfold_cv(meats, repeats = 10)

folds <- 
  folds %>%
  mutate(recipes = map(splits, prepper, recipe = norm_rec))

folds %>%
  mutate(recipes = map(splits, prepper, recipe = norm_rec)) %>%
  mutate(tidy = map(recipes, tidy, 1)) %>%
  select(-splits, -recipes) %>%
  rename(repeat_id = id) %>%
  unnest(tidy)
#> # A tibble: 20,000 × 6
#>    repeat_id id2    terms statistic value id             
#>    <chr>     <chr>  <chr> <chr>     <dbl> <chr>          
#>  1 Repeat01  Fold01 x_001 mean       2.82 normalize_CiSI9
#>  2 Repeat01  Fold01 x_002 mean       2.82 normalize_CiSI9
#>  3 Repeat01  Fold01 x_003 mean       2.83 normalize_CiSI9
#>  4 Repeat01  Fold01 x_004 mean       2.83 normalize_CiSI9
#>  5 Repeat01  Fold01 x_005 mean       2.83 normalize_CiSI9
#>  6 Repeat01  Fold01 x_006 mean       2.84 normalize_CiSI9
#>  7 Repeat01  Fold01 x_007 mean       2.84 normalize_CiSI9
#>  8 Repeat01  Fold01 x_008 mean       2.84 normalize_CiSI9
#>  9 Repeat01  Fold01 x_009 mean       2.85 normalize_CiSI9
#> 10 Repeat01  Fold01 x_010 mean       2.85 normalize_CiSI9
#> # ℹ 19,990 more rows

May I act what information you were hoping to see when calling View() on folds in the example above?

@cportner
Copy link

Thank you, this is very useful.

I mostly would like to be able to view lists in RStudio, such as those generated by rsample for bootstrapping and after nesting. I guess my problem is that I still have a hard time wrapping my head around exactly what information is at what "level" and what the different subparts contain.

@hfrick
Copy link
Member

hfrick commented Nov 1, 2023

Thanks for the discussion! I'm going to close as this is not an rsample issue.

@hfrick hfrick closed this as completed Nov 1, 2023
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants