Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

per_sample gradient is None but grad is populated #578

Open
anirban-nath opened this issue Mar 29, 2023 · 6 comments
Open

per_sample gradient is None but grad is populated #578

anirban-nath opened this issue Mar 29, 2023 · 6 comments
Assignees

Comments

@anirban-nath
Copy link

anirban-nath commented Mar 29, 2023

I have a particular LayerNorm function in my code because of which I am not able to successfully run Opacus in my code. This LayerNorm function function is defined just like 3 - 4 others in my code and is used in 2 places. When I execute loss.backward(), the grad of the layer function is populated but per_sample grad isn't, which leads Opacus to throw the error "Per sample gradient is not initialized. Not updated in backward pass?"

Under what circumstances is this possible?

PS: This is how the norm is defined

decoder_norm = nn.LayerNorm(d_model) self.decoder = TransformerDecoder(decoder_layer, num_decoder_layers, decoder_norm, return_intermediate=return_intermediate_dec)

This is how it is used. The usages are shown with comments beside them

`class TransformerDecoder(nn.Module):

def __init__(self, decoder_layer, num_layers, norm=None, return_intermediate=False):
    super().__init__()
    self.layers = _get_clones(decoder_layer, num_layers)
    self.num_layers = num_layers
    self.norm = norm // HERE
    self.return_intermediate = return_intermediate

def forward(self, tgt, memory,
            tgt_mask: Optional[Tensor] = None,
            memory_mask: Optional[Tensor] = None,
            tgt_key_padding_mask: Optional[Tensor] = None,
            memory_key_padding_mask: Optional[Tensor] = None,
            pos: Optional[Tensor] = None,
            query_pos: Optional[Tensor] = None):
    output = tgt

    intermediate = []

    for layer in self.layers:
        output = layer(output, memory, tgt_mask=tgt_mask,
                       memory_mask=memory_mask,
                       tgt_key_padding_mask=tgt_key_padding_mask,
                       memory_key_padding_mask=memory_key_padding_mask,
                       pos=pos, query_pos=query_pos)
        # print(output.shape)
        if self.return_intermediate:
            intermediate.append(self.norm(output)) // HERE

    if self.norm is not None:
        output = self.norm(output // HERE
        if self.return_intermediate:
            intermediate.pop()
            intermediate.append(output)`
@alexandresablayrolles
Copy link
Contributor

Thanks for raising this issue. The reason is that Opacus computes grad_samples using "hooks", so it only works for standard layers. You can pass grad_sample_mode="functorch" to make_private(), which will make Opacus use functorch to automatically compute grad_samples for new layers (it is not guaranteed to work but most of the time it does the job).

@alexandresablayrolles alexandresablayrolles self-assigned this Mar 29, 2023
@anirban-nath
Copy link
Author

Thanks for raising this issue. The reason is that Opacus computes grad_samples using "hooks", so it only works for standard layers. You can pass grad_sample_mode="functorch" to make_private(), which will make Opacus use functorch to automatically compute grad_samples for new layers (it is not guaranteed to work but most of the time it does the job).

Hi. I was using the make_private_with_epsilon function and I tried "functorch" but it did not work.

@alexandresablayrolles
Copy link
Contributor

It should also work with make_private_with_epsilon. Do you still have the same error message?

@anirban-nath
Copy link
Author

It should also work with make_private_with_epsilon. Do you still have the same error message?

Exact same error message. No difference. I tried with both make_private and make_private_with_epsilon. I even tried replacing that LayerNorm with a GroupNorm but none of these have made any difference.

@RobRomijnders
Copy link

Hi, I have a similar error. Was this issue resolved @anirban-nath ?

@HuanyuZhang
Copy link
Contributor

HuanyuZhang commented Sep 29, 2024

@RobRomijnders feel free to share your code here for us to better help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants