-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harmonize batch distribution ++ #359
Harmonize batch distribution ++ #359
Conversation
…g several approaches the old version only used the first approach. Verified this by adding a print in each prepare_data.approach() function and saw that only the first approach in internal$parameters$approach was used. Can maybe remove code comments before pull request is accepted. Maybe a better method to get the approach? Also updated roxygen2 for the function, as it seemed that it was reflecting the old version of shapr(?) due to arguments which are no longer present. However, one then get a warning when creating the roxygen2 documentation. Discuss some solutions as comments below. Discuss with Martin.
…ion `check_n_batches` threw an error for the vignette with gaussian approach with `n_combinations` = 8 and `n_batches = NULL`, as this function here then set `n_batches` = 10, which was too large. We subtract 1 as `check_n_batches` function specifies that `n_batches` must be strictly less than `n_combinations`.
The tests failed as I changed the printout. I added the word of as I thought that was more grammatically correct. But one can just remove it again. The intention of the sentence is clear. The new printout is: |
…to 2^m", but all the test only tested for "larger than". I.e., if the user specified n_combinations = 2^m in the call to shapr::explain, the function would not treat it as exact.
…t mode when `n_combinations = 2^m`, before the bugfix.
…`n_combinations >= 2^m`. Remove the large comment after discussing that with Martin.
@@ -217,7 +217,7 @@ check_n_batches <- function(internal) { | |||
|
|||
if (n_batches >= actual_n_combinations) { | |||
stop(paste0( | |||
"`n_batches` (", n_batches, ") must be smaller than the number feature combinations/`n_combinations` (", | |||
"`n_batches` (", n_batches, ") must be smaller than the number of feature combinations/`n_combinations` (", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the previous tests fail, as now the error messages are different. Let Martin decide if we keep this alteration or not.
…est checking that we do not get an error when runing the code after the bugfix has been applied.
Should not be merged yet, as another bug occurred when I was working on combined approaches.
|
There are also some other additional bugs with the combined approaches, e.g., that setting seed does not work. |
… the number of approaches and the number of unique approaches. This is for example useful to check that the provided `n_batches` is a valid value. (see next commits)
…the number of unique approaches. Before the user could, e.g., set `n_batches = 2`, but use 4 approaches and then shapr would use 4 but not update `n_batches` and without giwing a warning to the user.
… of unique approaches that is used. This was not done before and gave inconsistency in what number shapr would reccomend and use when `n_batches` was set to `null` by the user.
…ombined approaches. Furthermore, added if test, because previous version resulted in not reproducible code, as setting seed to `null` ruins that we set seed in `explain()`. Just consider this small example: # Set seed to get same values twice set.seed(123) rnorm(1) # Seting the same seed gives the same value set.seed(123) rnorm(1) # If we also include null then the seed is removed and we do not get the same value set.seed(123) set.seed(NULL) rnorm(1) # Setining seed to null actually gives a new "random" number each time. set.seed(123) set.seed(NULL) rnorm(1)
…e not the most elegant solution.
Looks like some test fails. Will look at that tomorrow. |
I have merged #356 into this PR now. Will try to look more into it. The file test-setup.R gave no fails, except for two fails; due to that, I added the word "of" to the printout of an error message. There are a lot of errors in test-output.R:
|
…us value (`n_batches = 1`) is not allowed anymore as it is lower than the number of unique used approaches.
…hapr into Lars/bugfix_get_default_n_batches
La til noen comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comments review
} | ||
min_checked <- max(c(this_min, suggestion)) | ||
ret <- min(c(this_max, min_checked, n_combinations - 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We subtract 1 as thecheck_n_batches()
function specifies that n_batches
must be strictly less than n_combinations
.
# Error in check_n_batches(internal) : | ||
# `n_batches` (10) must be smaller than the number feature combinations/`n_combinations` (8) | ||
# Bug only occures for "ctree", "gaussian", and "copula" as they are treated different in | ||
# `get_default_n_batches()`, I am not certain why. Ask Martin about the logic behind that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Hidden) question here... I should be better at making them more visable
R/setup_computation.R
Outdated
# Ensures that the number of batches corresponds to `n_batches` | ||
if (sum(batch_count_dt$n_batches_per_approach) != n_batches) { | ||
# Ensure that the number of batches is not larger than `n_batches`. | ||
# Remove one batch from the approach with the most batches. | ||
while (sum(batch_count_dt$n_batches_per_approach) > n_batches) { | ||
approach_to_subtract_batch <- which.max(batch_count_dt$n_batches_per_approach) | ||
batch_count_dt$n_batches_per_approach[approach_to_subtract_batch] <- | ||
batch_count_dt$n_batches_per_approach[approach_to_subtract_batch] - 1 | ||
} | ||
|
||
# Ensure that the number of batches is not lower than `n_batches`. | ||
# Add one batch to the approach with most coalitions per batch | ||
while (sum(batch_count_dt$n_batches_per_approach) < n_batches) { | ||
approach_to_add_batch <- which.max(batch_count_dt$n_S_per_approach / | ||
batch_count_dt$n_batches_per_approach) | ||
batch_count_dt$n_batches_per_approach[approach_to_add_batch] <- | ||
batch_count_dt$n_batches_per_approach[approach_to_add_batch] + 1 | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LHBO Not sure whether this extra complexity is really worth it to cover the edge cases where the n_batches does not add up. An alternative is to provide a warning that the number of batches is changed (if specified byt he user), or otherwise just change it. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martinju, I agree that it is not very elegant code, but it does not take long to run and is only run once.
The two main reasons I added this is:
- The first reason was to ensure that the number of batches adds up to
n_batches
. - The second reason, and likely the most important, is that if the user has specified
future::plan(multisession, workers = n_batches)
, then it would be nice thatshapr
actually usedn_batches
.
If this is not that important, then I am okay with just giving a warning.
The failing tests on R4.2 and R4.1 (oldrel-1 and oldrel-2) seems to be due to a change on how errors are reported to the terminal. In the previous versions a missing input argument would not throw an error before one tried to access the missing argument, while in R4.3 an error is thrown immediately. This causes an error now due to an update in testthat (remember to update to v3.2.0 if you run locally) which now shows which function is throwing the error, which is then different in the two versions. I will try to look into how to get around this. One possibility is to simply ignore it as we know the reason for the error, it does not cause practical problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Propose some code simplification in get_extra_parameters()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need the if-else anymore, so a simpler version of the code could be:
# Get the number of approaches, which is always either one or
# one less than the number of features if a combination of approaches is used.
internal$parameters$n_approaches <- length(internal$parameters$approach)
# Get the number of unique approaches, as the same
# approach can be used for several feature combination sizes.
internal$parameters$n_unique_approaches <- length(unique(internal$parameters$approach))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are ready to merge this now (provided the checks pass)
Done in this PR:
Breaking change: input argument approach should now of length one less than the number of features (of multiple approaches are combined). The last one was previously in any case just ignored. Docs and vignette updated accordingly.
Introduce internal helper arguments, n_approaches + n_unique_approaches
Bugfix how to set the default number of batches
Robustify procedure to distribute batches among multiple approaches
Update all snaphot test due to change output format of testthat 3.2.0
Temporarily disable oldrel-1 and oldrel-2 test due to changes in how R output error messages from oldrel-1 (R.4.2) to release (R 4.3) causing the output to look different (with the new testthat package).
OLD text:
…er of batches (when not provided in the explain function call) to a number which then throws an error in the check_n_batches function. Logical error as get_default_n_batches could previously set n_batches to a larger value than n_combinations - 1. Subtract one as the check_n_batches function specifies that n_batches must be strictly less than n_combinations.
The bug occurred for example for: