Harmonize batch distribution ++ #359

LHBO · 2023-08-24T10:15:17Z

Done in this PR:

Breaking change: input argument approach should now of length one less than the number of features (of multiple approaches are combined). The last one was previously in any case just ignored. Docs and vignette updated accordingly.
Introduce internal helper arguments, n_approaches + n_unique_approaches
Bugfix how to set the default number of batches
Robustify procedure to distribute batches among multiple approaches
Update all snaphot test due to change output format of testthat 3.2.0
Temporarily disable oldrel-1 and oldrel-2 test due to changes in how R output error messages from oldrel-1 (R.4.2) to release (R 4.3) causing the output to look different (with the new testthat package).

OLD text:

…er of batches (when not provided in the explain function call) to a number which then throws an error in the check_n_batches function. Logical error as get_default_n_batches could previously set n_batches to a larger value than n_combinations - 1. Subtract one as the check_n_batches function specifies that n_batches must be strictly less than n_combinations.

The bug occurred for example for:

# The other stuff here is the variables from the first code block in the vignette.
> explanation <- explain(
+   model = model,
+   x_explain = x_explain,
+   x_train = x_train,
+   approach = "gaussian",
+   prediction_zero = p0,
+   n_combinations = 8
+ )
Note: Feature classes extracted from the model contains NA.
Assuming feature classes from the data are correct.

Setting parameter 'n_batches' to 10 as a fair trade-off between memory consumption and computation time.
Reducing 'n_batches' typically reduces the computation time at the cost of increased memory consumption.

Error in check_n_batches(internal) : 
  `n_batches` (10) must be smaller than the number of feature combinations/`n_combinations` (8)

…g several approaches the old version only used the first approach. Verified this by adding a print in each prepare_data.approach() function and saw that only the first approach in internal$parameters$approach was used. Can maybe remove code comments before pull request is accepted. Maybe a better method to get the approach? Also updated roxygen2 for the function, as it seemed that it was reflecting the old version of shapr(?) due to arguments which are no longer present. However, one then get a warning when creating the roxygen2 documentation. Discuss some solutions as comments below. Discuss with Martin.

…ion `check_n_batches` threw an error for the vignette with gaussian approach with `n_combinations` = 8 and `n_batches = NULL`, as this function here then set `n_batches` = 10, which was too large. We subtract 1 as `check_n_batches` function specifies that `n_batches` must be strictly less than `n_combinations`.

LHBO · 2023-08-24T10:32:58Z

The tests failed as I changed the printout. I added the word of as I thought that was more grammatically correct. But one can just remove it again. The intention of the sentence is clear.

The new printout is:
n_batches (32) must be smaller than the number of feature combinations/n_combinations (32)

…to 2^m", but all the test only tested for "larger than". I.e., if the user specified n_combinations = 2^m in the call to shapr::explain, the function would not treat it as exact.

…t mode when `n_combinations = 2^m`, before the bugfix.

…`n_combinations >= 2^m`. Remove the large comment after discussing that with Martin.

LHBO · 2023-08-31T10:52:24Z

R/setup.R

@@ -217,7 +217,7 @@ check_n_batches <- function(internal) {

  if (n_batches >= actual_n_combinations) {
    stop(paste0(
-      "`n_batches` (", n_batches, ") must be smaller than the number feature combinations/`n_combinations` (",
+      "`n_batches` (", n_batches, ") must be smaller than the number of feature combinations/`n_combinations` (",


This makes the previous tests fail, as now the error messages are different. Let Martin decide if we keep this alteration or not.

…est checking that we do not get an error when runing the code after the bugfix has been applied.

LHBO · 2023-09-04T09:43:31Z

Should not be merged yet, as another bug occurred when I was working on combined approaches.

explain_numeric = explain(
  model = model_lm_numeric,
  x_explain = x_explain_numeric,
  x_train = x_train_numeric,
  approach = c("independence", "empirical", "gaussian", "copula", "empirical"),
  prediction_zero = p0,
  n_batches = NULL,
  timing = FALSE)
Setting parameter 'n_batches' to 2 as a fair trade-off between memory consumption and computation time.
Reducing 'n_batches' typically reduces the computation time at the cost of increased memory consumption.


explain_numeric$internal$objects$S_batch
$`1`
[1]  2  3  4  5  6 32

$`2`
 [1]  7  8  9 10 11 12 13 14 15 16

$`3`
 [1] 17 18 19 20 21 22 23 24 25 26

$`4`
[1] 27 28 29 30 31

> explain_numeric$internal$parameters$n_batches
[1] 2

LHBO · 2023-09-04T09:45:02Z

There are also some other additional bugs with the combined approaches, e.g., that setting seed does not work.

… the number of approaches and the number of unique approaches. This is for example useful to check that the provided `n_batches` is a valid value. (see next commits)

…the number of unique approaches. Before the user could, e.g., set `n_batches = 2`, but use 4 approaches and then shapr would use 4 but not update `n_batches` and without giwing a warning to the user.

… of unique approaches that is used. This was not done before and gave inconsistency in what number shapr would reccomend and use when `n_batches` was set to `null` by the user.

…ombined approaches. Furthermore, added if test, because previous version resulted in not reproducible code, as setting seed to `null` ruins that we set seed in `explain()`. Just consider this small example: # Set seed to get same values twice set.seed(123) rnorm(1) # Seting the same seed gives the same value set.seed(123) rnorm(1) # If we also include null then the seed is removed and we do not get the same value set.seed(123) set.seed(NULL) rnorm(1) # Setining seed to null actually gives a new "random" number each time. set.seed(123) set.seed(NULL) rnorm(1)

…e not the most elegant solution.

LHBO · 2023-09-04T17:22:44Z

Looks like some test fails. Will look at that tomorrow.

…nations

LHBO · 2023-09-11T11:31:28Z

I looked at the tests here and almost all of them are OK (just adding two new objects), except for the test involving the independence approach, where 'something' is off slightly, causing a small numerical change (10th decimal place or something). It is probably fine, but would be good to know what causes it. That will be easier to see once #356 is merged as this PR contains all those changes as well, making it harder to spot what is actually changed here.

I have merged #356 into this PR now. Will try to look more into it.

The file test-setup.R gave no fails, except for two fails; due to that, I added the word "of" to the printout of an error message.

There are a lot of errors in test-output.R:

Some of the mistakes are related to that in the comb explainers where n_batches is set to 1, which is now an Invalid input as n_batches must be equal or larger than the number of unique approaches used. This was not the case before; then, the user could let n_batches = 1, but shapr would silently overwrite it. I would recommend to use n_batches = 10, then we test that both the random distribution of the batches between the approaches are the same each time and the generated MC samples.
I have tried to look more into it, but not sure about the pipeline/workflow. E.g., when I call testthat::snapshot_review('output/') to review changes, I just get errors:

Error: embedded nul in string: '\037\x8b\b\0\0\0\0\0\0\003\xed\032\ttTE\xf2\xcfE\xeedB\002\xe4 \x87\020\x82@\fgH\u0091߁\x80$$\001\002\xe1\022\030&\xc9'\031\x99̌3\023\004d5\034\vB\x94C@\x96p\bx\0*^xn\x84\x89 \b\vD\016\021\027\x82r(+(AQ\\D \xdbݿz\xe6\xcf\xcf f\t\xba\xef\xb9\xf3R\xa9\xee\xea\xaa\xea\xaa\xea\xee\xfa\xdd\xfd\xffh_\x8e\xe3T\x9cZ\xa1\xe0T\032\\\xe44\xf9#\006ޗ\x82i!\xb4A\xc4\xcd0\004\x92:\032\x90r\xf6\xe9q\x9f\xce"x\036\xf7ؿ\b^\026\x9d\022\xcbګی\xd8dXU\xf6\004\nH\xc9\xd6\xed{_\x83\u009b\xaf\x88\xf0\xae[\xe8\x94o\xbf>i\xee\xe9\xd4\xfc\xeaV\xb1{\xf6\xcek\031[\x9dd9߭\xe8\x8b\021\xce\xf6\x8e\023\xbc̏\xc6\005VG\x84\xa5$\xb5\x9b\x99P\035\x955\xa1 (\032\xb1vG\xed\xbaw3\xdf?\033\xe68qt\\\xe9\xd2\031g\xf8k\xb1W\xben\xd3\xf2_\xce\xfeC\x92\x86\xae\xf1iy\xb0:<\xd1\177Ea\xe7J\0247w䖃gJ9N\xad\xc4\xed\nN\xcd\xf9\020'M\xfaR\xc1\x86\vZ\xea\x9cHT\x9b\xcc&\001\xca^\xc3\xcdF\xbd51\x8f5\x8d2\x98\x8aXy\x84Pj\x81\xb2*C?\r\x8a\xfe9f\x93\xbdD7I_h7[e\xdd\xf9X\xcd\017'\xb2.\0030(\xcb\xf1\xbf\xfa\xfa

Talk with Martin and see if it works on his computer.

…us value (`n_batches = 1`) is not allowed anymore as it is lower than the number of unique used approaches.

…hapr into Lars/bugfix_get_default_n_batches

LHBO · 2023-11-16T10:28:25Z

La til noen comments

LHBO

added comments review

LHBO · 2023-08-31T12:22:45Z

R/setup.R

  }
+  min_checked <- max(c(this_min, suggestion))
+  ret <- min(c(this_max, min_checked, n_combinations - 1))


We subtract 1 as thecheck_n_batches() function specifies that n_batches must be strictly less than n_combinations.

LHBO · 2023-11-16T09:55:02Z

tests/testthat/test-setup.R

+  # Error in check_n_batches(internal) :
+  #   `n_batches` (10) must be smaller than the number feature combinations/`n_combinations` (8)
+  # Bug only occures for "ctree", "gaussian", and "copula" as they are treated different in
+  # `get_default_n_batches()`, I am not certain why. Ask Martin about the logic behind that.


(Hidden) question here... I should be better at making them more visable

martinju · 2023-11-16T14:50:01Z

R/setup_computation.R

+    # Ensures that the number of batches corresponds to `n_batches`
+    if (sum(batch_count_dt$n_batches_per_approach) != n_batches) {
+      # Ensure that the number of batches is not larger than `n_batches`.
+      # Remove one batch from the approach with the most batches.
+      while (sum(batch_count_dt$n_batches_per_approach) > n_batches) {
+        approach_to_subtract_batch <- which.max(batch_count_dt$n_batches_per_approach)
+        batch_count_dt$n_batches_per_approach[approach_to_subtract_batch] <-
+          batch_count_dt$n_batches_per_approach[approach_to_subtract_batch] - 1
+      }
+
+      # Ensure that the number of batches is not lower than `n_batches`.
+      # Add one batch to the approach with most coalitions per batch
+      while (sum(batch_count_dt$n_batches_per_approach) < n_batches) {
+        approach_to_add_batch <- which.max(batch_count_dt$n_S_per_approach /
+                                            batch_count_dt$n_batches_per_approach)
+        batch_count_dt$n_batches_per_approach[approach_to_add_batch] <-
+          batch_count_dt$n_batches_per_approach[approach_to_add_batch] + 1
+      }
+    }
+


@LHBO Not sure whether this extra complexity is really worth it to cover the edge cases where the n_batches does not add up. An alternative is to provide a warning that the number of batches is changed (if specified byt he user), or otherwise just change it. What do you think?

@martinju, I agree that it is not very elegant code, but it does not take long to run and is only run once.
The two main reasons I added this is:

The first reason was to ensure that the number of batches adds up to n_batches.

The second reason, and likely the most important, is that if the user has specified future::plan(multisession, workers = n_batches), then it would be nice that shapractually used n_batches.

If this is not that important, then I am okay with just giving a warning.

martinju · 2023-11-17T13:23:51Z

The failing tests on R4.2 and R4.1 (oldrel-1 and oldrel-2) seems to be due to a change on how errors are reported to the terminal. In the previous versions a missing input argument would not throw an error before one tried to access the missing argument, while in R4.3 an error is thrown immediately. This causes an error now due to an update in testthat (remember to update to v3.2.0 if you run locally) which now shows which function is throwing the error, which is then different in the two versions. I will try to look into how to get around this. One possibility is to simply ignore it as we know the reason for the error, it does not cause practical problems.

LHBO

Propose some code simplification in get_extra_parameters().

LHBO · 2023-11-20T12:27:49Z

R/setup.R

We do not need the if-else anymore, so a simpler version of the code could be:

# Get the number of approaches, which is always either one or # one less than the number of features if a combination of approaches is used. internal$parameters$n_approaches <- length(internal$parameters$approach) # Get the number of unique approaches, as the same # approach can be used for several feature combination sizes. internal$parameters$n_unique_approaches <- length(unique(internal$parameters$approach))

martinju

I think we are ready to merge this now (provided the checks pass)

LHBO added 3 commits August 24, 2023 10:47

Samll typo.

e9f6ae9

LHBO added 3 commits August 24, 2023 12:44

Fixed bug. All messages says "n_combinations is larger than or equal …

7412780

…to 2^m", but all the test only tested for "larger than". I.e., if the user specified n_combinations = 2^m in the call to shapr::explain, the function would not treat it as exact.

Added script demonstrating the bug that shapr does not enter the exac…

22c8e17

…t mode when `n_combinations = 2^m`, before the bugfix.

Added (tentative) test that checks that shapr enters exact mode when …

a05b82f

…`n_combinations >= 2^m`. Remove the large comment after discussing that with Martin.

LHBO commented Aug 31, 2023

View reviewed changes

Added script that demonstrates the bug before the bugfix, and added t…

d0f278d

…est checking that we do not get an error when runing the code after the bugfix has been applied.

LHBO requested a review from martinju August 31, 2023 12:27

Fixed lint warnings in approach.R.

6fd2d91

LHBO added 13 commits September 4, 2023 12:25

Added two parameters to the internal$parameters list which contains…

4f0bdb9

… the number of approaches and the number of unique approaches. This is for example useful to check that the provided `n_batches` is a valid value. (see next commits)

Added test to check that n_batches must be larger than or equal to …

2a940bf

…the number of unique approaches. Before the user could, e.g., set `n_batches = 2`, but use 4 approaches and then shapr would use 4 but not update `n_batches` and without giwing a warning to the user.

Updated get_default_n_batches to take into consideration the number…

303df5c

… of unique approaches that is used. This was not done before and gave inconsistency in what number shapr would reccomend and use when `n_batches` was set to `null` by the user.

Typo

8e6cc9b

Added test to check that setting the seed works for combined approaches.

246c2cf

typo in test function

e873f1d

Added file to demonstrate the bugs (before the bugfix)

5a2c2eb

Added new test

42c5ed1

Updated tests by removing n_samples

bccf6ff

Added a bugfix to shapr not using the correct number of batches. Mayb…

078c838

…e not the most elegant solution.

Updated the demonstration script

703b248

Added last test and fixed lintr

c903e6b

LHBO and others added 3 commits September 4, 2023 20:22

Lint again.

801ff5f

Merge remote-tracking branch 'origin/master' into Lars/bugfix_n_combi…

3d216ee

…nations

styler

9de817f

LHBO and others added 4 commits September 12, 2023 11:00

Changed to n_batches = 10 in the combined approaches, as the previo…

e0d925d

…us value (`n_batches = 1`) is not allowed anymore as it is lower than the number of unique used approaches.

Merge branch 'Lars/bugfix_get_default_n_batches' of github.com:LHBO/s…

11cf088

…hapr into Lars/bugfix_get_default_n_batches

accept OK test changes

0b146bc

additonal Ok test files

fa6a5b9

LHBO commented Nov 16, 2023

View reviewed changes

martinju added 2 commits November 16, 2023 15:09

change batches in test files

c2599fe

accept new files

aacb474

martinju reviewed Nov 16, 2023

View reviewed changes

martinju added 2 commits November 17, 2023 07:43

handle issue with a breaking change update in the testthat package

4dd1a86

+ these

30c202d

martinju added 6 commits November 20, 2023 11:48

removing last (unused) input of approach

a224648

updating tests

b6da078

+ update setup tests/snaps

c9ade53

correcting unique length

4851217

update linting and vignette

cdc624d

update docs

8833b0f

LHBO commented Nov 20, 2023

View reviewed changes

martinju added 4 commits November 20, 2023 13:55

fix example issue

b865a65

temporary disable tests on older R systems

53c57eb

remove unecessary if-else test

a04f127

data.table style on Lars's batch adjustment suggestion

79ddd35

martinju changed the title ~~Fixed bug in get_default_n_batches function where shapr sets the numb…~~ Harmonize batch distribution ++ Nov 20, 2023

martinju added 2 commits November 20, 2023 15:06

del comment

4253ef5

lint

2fd62b5

martinju approved these changes Nov 20, 2023

View reviewed changes

LHBO merged commit 90afa9a into NorskRegnesentral:master Nov 20, 2023
7 checks passed

LHBO deleted the Lars/bugfix_get_default_n_batches branch November 20, 2023 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harmonize batch distribution ++ #359

Harmonize batch distribution ++ #359

LHBO commented Aug 24, 2023 •

edited by martinju

Loading

LHBO commented Aug 24, 2023

LHBO Aug 31, 2023

LHBO commented Sep 4, 2023

LHBO commented Sep 4, 2023

LHBO commented Sep 4, 2023

LHBO commented Sep 11, 2023 •

edited

Loading

LHBO commented Nov 16, 2023

LHBO left a comment

LHBO Aug 31, 2023

LHBO Nov 16, 2023

martinju Nov 16, 2023

LHBO Nov 20, 2023

martinju commented Nov 17, 2023

LHBO left a comment

LHBO Nov 20, 2023

martinju left a comment

Harmonize batch distribution ++ #359

Harmonize batch distribution ++ #359

Conversation

LHBO commented Aug 24, 2023 • edited by martinju Loading

LHBO commented Aug 24, 2023

LHBO Aug 31, 2023

Choose a reason for hiding this comment

LHBO commented Sep 4, 2023

LHBO commented Sep 4, 2023

LHBO commented Sep 4, 2023

LHBO commented Sep 11, 2023 • edited Loading

LHBO commented Nov 16, 2023

LHBO left a comment

Choose a reason for hiding this comment

LHBO Aug 31, 2023

Choose a reason for hiding this comment

LHBO Nov 16, 2023

Choose a reason for hiding this comment

martinju Nov 16, 2023

Choose a reason for hiding this comment

LHBO Nov 20, 2023

Choose a reason for hiding this comment

martinju commented Nov 17, 2023

LHBO left a comment

Choose a reason for hiding this comment

LHBO Nov 20, 2023

Choose a reason for hiding this comment

martinju left a comment

Choose a reason for hiding this comment

LHBO commented Aug 24, 2023 •

edited by martinju

Loading

LHBO commented Sep 11, 2023 •

edited

Loading