The gradient of h_estimate is not cut down. #6

ZhangXiao96 · 2020-03-08T10:59:13Z

Nice repo!
However, I think that the gradient of h_estimate is not cut down, which may lead to some problems.

expectopatronum · 2020-03-26T09:54:50Z

Hi @ZhangXiao96, what do you mean by that? Do you have an idea how this could be fixed?

ryokamoi · 2020-06-12T05:02:32Z

Maybe @ZhangXiao96 is talking about what is mentioned here.
#5 (comment)

zhongyy · 2020-06-16T11:19:17Z

Hi! Thanks for the great code!
1、I am wondering how much GPU this code shoud take? Since I found with recursion_depth = 5000 and r=10, the GPU is growing and finally "out of memory". Is this normal?
2、I tried to cut down the gradient of h_estimate, but found the influence function is strange.

How can I fixed these problems? Thanks in advance.

Here is my experimental Details:
1、I just run the cifar experiment using the provided simple network(conv-pool-conv-fc-fc-fc-softmax) with default setting (recursion_depth = 1and r=1). --> It seems like the code can work.
2、I change the parameters to "recursion_depth = 5000 and r=10", still using batch-size =4 and 1GPU (12GB). --> The GPU increase from 500 MiB to (out of memory).
3、I find the code recursively caclulate h_estimate, and this is the source of the increasing GPU.

h_estimate = hvp(loss, list(model.parameters()), h_estimate)

After reading the issues, I think maybe "h_estimate" should not be directly use for calculate itselves the next step. Because each step, only the value of h_estimate should be used. If h_estimate is used as a "variable", I am not sure whether the hvp() function will calculate multiple order gradient?
Therefore I made sevel changes like

h_estimate = hvp(loss, list(model.parameters()), h_estimate)
h_estimate = [ _v.detach() + (1 - damp) * _h_e.detach() - _hv.detach() / scale
                    for _v, _h_e, _hv in zip(v, h_estimate, hv)]

or

h_estimate = hvp(loss, list(model.parameters()), h_estimate)
with torch.no_grad():
        h_estimate = [ _v + (1 - damp) * _h_e - _hv / scale
                  for _v, _h_e, _hv in zip(v, h_estimate, hv)]

But I found, both this two modification will increase the influence function to NAN as recursion_depth increases.

zhongyy · 2020-06-16T11:27:05Z

Maybe @ZhangXiao96 is talking about what is mentioned here.
#5 (comment)

I am not sure it is right to use the initial h_estimate to calculate the hvp() in each step. I check the tf code provided by the author (https://github.com/kohpangwei/influence-release/blob/578bc458b4d7cc39ed7343b9b271a04b60c782b1/influence/genericNeuralNet.py#L475).
It seems like the h_estimate is update each step?( It is hard for me to understand tf code so I may I misunderstand. )

ryokamoi · 2020-06-16T13:06:56Z

Hi @zhongyy,
I agree we should add with torch.no_grad().

However, why did you modify hvp part?
I think hv = hvp(loss, list(model.parameters()), h_estimate) should not be changed.

I want to ask more about your NAN error. How can we reproduce that error?
I also had received unreasonable results in some cases.

(note)
This fork may be helpful.
https://github.com/dedeswim/pytorch_influence_functions

wangdi19941224 · 2020-06-22T08:39:58Z

@zhongyy I have the same problem like you.The h_estimate is increasing in the iteration and will be nan.So do you fix this problem?

iamgroot42 · 2020-07-05T23:52:28Z

@zhongyy @wangdi19941224 @ryokamoi did any of you manage to fix the NaN blowing up issue? I face the same whenever I encase it with a torch.no_grad()

ryokamoi · 2020-07-06T04:58:07Z

@iamgroot42 What kind of model did you use?
It is possible to face NaN problem even if the code is correct.

One possible solution is to use a larger "scale".
Taylor expansion in LiSSA originally assumes that detH <= 1.
To alleviate this condition, you may use a larger scale, but more iteration would be required.

iamgroot42 · 2020-07-06T05:18:51Z

@ryokamoi it's VGG19.

I did try increasing "scale" to 500. I got rid of the NaNs (for now).
Is there a good heuristic/estimate to see what lowest (or ballpark) value of 'scale' would work well?

ryokamoi · 2020-07-06T06:19:15Z

@iamgroot42 I think there is no computationally easy way to get the lowest scale since we have to calculate detH.

iamgroot42 · 2020-07-06T16:05:35Z

Right. Thanks a lot, @ryokamoi :D

thongnt99 · 2020-12-14T08:29:31Z

Hi everyone, have any of you managed to solve the NAN problem? I've increased the scale to a very large number, but still got NAN after about 100 iterations.?

thongnt99 mentioned this issue Dec 14, 2020

pass "damp" and "scale" to cacl_influence_single #21

Open

ajsanjoaquin mentioned this issue May 27, 2021

How to prevent NAN influence values? #27

Closed

PaulCCCCCCH mentioned this issue Sep 25, 2022

Influence calculation NaN and OOM problem. PaulCCCCCCH/Robustar_implementation#62

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The gradient of h_estimate is not cut down. #6

The gradient of h_estimate is not cut down. #6

ZhangXiao96 commented Mar 8, 2020

expectopatronum commented Mar 26, 2020

ryokamoi commented Jun 12, 2020 •

edited

Loading

zhongyy commented Jun 16, 2020

zhongyy commented Jun 16, 2020

ryokamoi commented Jun 16, 2020

wangdi19941224 commented Jun 22, 2020

iamgroot42 commented Jul 5, 2020

ryokamoi commented Jul 6, 2020

iamgroot42 commented Jul 6, 2020

ryokamoi commented Jul 6, 2020

iamgroot42 commented Jul 6, 2020

thongnt99 commented Dec 14, 2020

The gradient of h_estimate is not cut down. #6

The gradient of h_estimate is not cut down. #6

Comments

ZhangXiao96 commented Mar 8, 2020

expectopatronum commented Mar 26, 2020

ryokamoi commented Jun 12, 2020 • edited Loading

zhongyy commented Jun 16, 2020

zhongyy commented Jun 16, 2020

ryokamoi commented Jun 16, 2020

wangdi19941224 commented Jun 22, 2020

iamgroot42 commented Jul 5, 2020

ryokamoi commented Jul 6, 2020

iamgroot42 commented Jul 6, 2020

ryokamoi commented Jul 6, 2020

iamgroot42 commented Jul 6, 2020

thongnt99 commented Dec 14, 2020

ryokamoi commented Jun 12, 2020 •

edited

Loading