Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The gradient of h_estimate is not cut down. #6

Open
ZhangXiao96 opened this issue Mar 8, 2020 · 12 comments
Open

The gradient of h_estimate is not cut down. #6

ZhangXiao96 opened this issue Mar 8, 2020 · 12 comments

Comments

@ZhangXiao96
Copy link

Nice repo!
However, I think that the gradient of h_estimate is not cut down, which may lead to some problems.

@expectopatronum
Copy link
Contributor

Hi @ZhangXiao96, what do you mean by that? Do you have an idea how this could be fixed?

@ryokamoi
Copy link

ryokamoi commented Jun 12, 2020

Maybe @ZhangXiao96 is talking about what is mentioned here.
#5 (comment)

@zhongyy
Copy link

zhongyy commented Jun 16, 2020

Hi! Thanks for the great code!
1、I am wondering how much GPU this code shoud take? Since I found with recursion_depth = 5000 and r=10, the GPU is growing and finally "out of memory". Is this normal?
2、I tried to cut down the gradient of h_estimate, but found the influence function is strange.

How can I fixed these problems? Thanks in advance.

Here is my experimental Details:
1、I just run the cifar experiment using the provided simple network(conv-pool-conv-fc-fc-fc-softmax) with default setting (recursion_depth = 1and r=1). --> It seems like the code can work.
2、I change the parameters to "recursion_depth = 5000 and r=10", still using batch-size =4 and 1GPU (12GB). --> The GPU increase from 500 MiB to (out of memory).
3、I find the code recursively caclulate h_estimate, and this is the source of the increasing GPU.

h_estimate = hvp(loss, list(model.parameters()), h_estimate)

After reading the issues, I think maybe "h_estimate" should not be directly use for calculate itselves the next step. Because each step, only the value of h_estimate should be used. If h_estimate is used as a "variable", I am not sure whether the hvp() function will calculate multiple order gradient?
Therefore I made sevel changes like

h_estimate = hvp(loss, list(model.parameters()), h_estimate)
h_estimate = [ _v.detach() + (1 - damp) * _h_e.detach() - _hv.detach() / scale
                    for _v, _h_e, _hv in zip(v, h_estimate, hv)]

or

h_estimate = hvp(loss, list(model.parameters()), h_estimate)
with torch.no_grad():
        h_estimate = [ _v + (1 - damp) * _h_e - _hv / scale
                  for _v, _h_e, _hv in zip(v, h_estimate, hv)]

But I found, both this two modification will increase the influence function to NAN as recursion_depth increases.

@zhongyy
Copy link

zhongyy commented Jun 16, 2020

Maybe @ZhangXiao96 is talking about what is mentioned here.
#5 (comment)

I am not sure it is right to use the initial h_estimate to calculate the hvp() in each step. I check the tf code provided by the author (https://github.com/kohpangwei/influence-release/blob/578bc458b4d7cc39ed7343b9b271a04b60c782b1/influence/genericNeuralNet.py#L475).
It seems like the h_estimate is update each step?( It is hard for me to understand tf code so I may I misunderstand. )

@ryokamoi
Copy link

Hi @zhongyy,
I agree we should add with torch.no_grad().

However, why did you modify hvp part?
I think hv = hvp(loss, list(model.parameters()), h_estimate) should not be changed.

I want to ask more about your NAN error. How can we reproduce that error?
I also had received unreasonable results in some cases.

(note)
This fork may be helpful.
https://github.com/dedeswim/pytorch_influence_functions

@wangdi19941224
Copy link

@zhongyy I have the same problem like you.The h_estimate is increasing in the iteration and will be nan.So do you fix this problem?

@iamgroot42
Copy link

@zhongyy @wangdi19941224 @ryokamoi did any of you manage to fix the NaN blowing up issue? I face the same whenever I encase it with a torch.no_grad()

@ryokamoi
Copy link

ryokamoi commented Jul 6, 2020

@iamgroot42 What kind of model did you use?
It is possible to face NaN problem even if the code is correct.

One possible solution is to use a larger "scale".
Taylor expansion in LiSSA originally assumes that detH <= 1.
To alleviate this condition, you may use a larger scale, but more iteration would be required.

@iamgroot42
Copy link

@ryokamoi it's VGG19.

I did try increasing "scale" to 500. I got rid of the NaNs (for now).
Is there a good heuristic/estimate to see what lowest (or ballpark) value of 'scale' would work well?

@ryokamoi
Copy link

ryokamoi commented Jul 6, 2020

@iamgroot42 I think there is no computationally easy way to get the lowest scale since we have to calculate detH.

@iamgroot42
Copy link

Right. Thanks a lot, @ryokamoi :D

@thongnt99
Copy link

Hi everyone, have any of you managed to solve the NAN problem? I've increased the scale to a very large number, but still got NAN after about 100 iterations.?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants