-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Want to contribute - Latest updated source? #318
Comments
Hi, and thanks for reaching out and your willingness to contribute. If you want to contribute now, you should do so to the branch named "devel". Currently, the changes are not well documented, but at least the basic readme example and most of the vignette is up to date in the new branch. For your particular case of adding new model classes, I now put each model class in a new file, see e.g. https://github.com/NorskRegnesentral/shapr/blob/devel/R/model_ranger.R |
Hi, thanks for the clarification of which branch to work against! My interest is working with time series models, to explain the predictions at each step in the future. For now I have managed to do so by running the explanation function one time for each future value. Do you have any suggestions on how to pass more complex x_explain values? What I would want to do is include a "steps ahead" parameter to the predict function, which would allow me to have say 3 explanations, one for each step in the future, using the same x_explain vector. The way that I solved it for now was to set this parameter on the model before passing it to the explain functions, but I think it is a big inefficient having to run the function one time for each step. What would be even more neat is to be able to have the predict function return a vector, which would then contain multiple values which are all explained separately. It more or less implies supporting models with multiple outputs, which I think is a quite general case which would add a lot of usability to the package. Do you see any general problems with this train of thought? |
Interesting application! I see two possible ways to by-pass the problem in your (and similar) cases:
The plan is to support point 2, although I have not decided exactly how to do it yet, and it will not be ready for the release this year. Which solution would you prefer? PS: In the devel version we have implemented a new approach for time series prediction explanation (see approach='timeseries' and an example in the new vignette). |
Yes, there are a lot of new things to take into consideration when explaining a forecast, like which value should be used as I will have to look into the code for the different approaches to see how it could be possible to do multiple outputs, it may very well be that you are correct that it is quite hard to get it working. If point 1 would be supported, solving the multiple steps would be as easy as giving the step to forecast as an additional argument to the observations. I am however not sure if this would aid computational performance. I will try to investigate the code a bit closer to be able to give some more informed answers! I will also look at the timeseries approach, is it ready to be used or under development? |
Regarding the multiple output, I am mostly worried about the bookkeeping. I am happy to discuss solutions if you give it a go. From my perspective point 1 and 2 would both be as computationally efficient as supporting multiple output directly, it would just be slightly less user friendly. The timeseries approach should be ready for use. It is the simple method introduced in section 4.3 in this paper: https://martinjullum.com/publication/jullum-2021-efficient/jullum-2021-efficient.pdf You may also be interested in a recent paper which discusses similar (and more advanced?) methods for the same use case: https://arxiv.org/ftp/arxiv/papers/2211/2211.06507.pdf (I have not read it properly myself.) |
Thanks for the links to the papers, will have a read when I get time. I have started some work on the code, made a small PR which I am not completely sure if it is useful, but I thought that I should start somewhere. Let me know if you have any comments. |
I did manage to implement multiple outputs, and also got to know the package and methods a bit. If you have a look here https://github.com/jonlachmann/shapr/tree/output_size, you can maybe see if I managed to do something stupid or if it seems reasonable. |
Great! I just had a quick glance and couldn't spot any obvious issues. Could you upload a basic example script as well? |
Great to hear, I am not really used to working with data.table, so I learned a bit while getting it to work. I usually always just let everything be a matrix, but can see the benefits in this case where more complicated selections and aggregations are needed. I formatted and commented my test example and created a gist here https://gist.github.com/jonlachmann/47bcfc4b1a072debd7278547992cd1a6, it should work given that you check out my branch and run it from inside that folder. Let me know if you get it to run or have any questions. |
Hi!
I am interested in adding support for some new model classes, and saw the text in the readme saying that there are a lot of ongoing refactoring. I was just wondering if there is a branch which it is preferred to develop against, to avoid having to refactor more than necessary once the other code re-factorization is finished. Not really sure if this fits as an issue, but I figured it was the least inappropriate place to put this question.
Anyway, thanks for a well written R package!
The text was updated successfully, but these errors were encountered: