-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run shapr on HPC with a large size for x_explain #370
Comments
Hi Hmm, how exactly did you set up the parallellization with future? What backend are you using, and what OS is this on? It seems some data gets lots to some of the computers... Before doing anything else, I would, however, recommend trying out the latest commit in branch LHBO:Lars/Improve_Gaussian in PR #366 This should speed up the copula method by orders of magnitude. We'll probably merge the PR into master tomorrow. |
Dear Martin, Many thanks for this quick feedback. I am new in R and shapr. The hpc is in Linux environment. Regrading the PR 366, I cannot understand the suggestions. Would you please help me. Please find the code as follow:
I can also send you the dataset. I am most grateful for your priceless time in advance. Best, |
Hi again. The mentioned PR is now merged, so simply installing shapr again with |
Hello Martin, I really appreciate your quick feedback. Best, |
Yes, indeed. This is essentially what is done if you set n_batches =2. Depending on how much preprocessing there is, calling explain twice may take much longer or almost the same time. |
Many thanks, Martin.
According to (https://norskregnesentral.github.io/shapr/articles/understanding_shapr.html)
I will be honored if I can benefit from your guidance. Would you please have a glance at my code and let me know your suggestions. (My HPC has enough computational capacity). Kind regards, |
If you set n_batches=15, that essentially corresponds to splitting your x_explain into 15 parts and calling explain() 15 times (with n_batches=1). Since you have very large data which may take hours or days to explain properly, I would still split your x_explain into 15 (and use n_batches=10 or so). The reason is that if something crashes you loose everything as there is no temporary saving to disk or so implemented as of now. If you do it in one part at a time, you can do that yourself. I would also recommend using the progress bsr option to follow progress. See the vignette for how to set that up. Hope this helps. |
Dear Martin, I would like to extend my sincere gratitude to you for this fruitful discussion. I hope you outstanding achievements in the development of the Best regards, |
Hi,
I try to use the shapr in my project, but I have a problem. I set these parameters in my code and run it on HPC with 100 GB memory and 4 workers:
In my main dataset, I have x_train=16X13920 and x_test=16X3480. I decided to only consider a sunset of 300 for my explain. So, I considered the "x_explain" with 300 sample size and 16 features, . but I see this error:
Surprisingly, when I considered the "x_explain" with 200 sample size and 16 features, it is done successfully. Could you explain why this problem is happen? In fact, my major problem is that how can I explain all 3480 observations. Is it possible? Please note that I have HPC and the computational cost is not really matters.
Thanks in advance for your help and the outstanding shapr!
The text was updated successfully, but these errors were encountered: