How to get only the last few layers' gradident? #1101

pmzzs · 2023-01-13T21:48:42Z

from functorch import make_functional_with_buffers, vmap, grad
fmodel, params, buffers = make_functional_with_buffers(net,disable_autograd_tracking=True)

def compute_loss_stateless_model (params, buffers, sample, target):
    batch = sample.unsqueeze(0)
    targets = target.unsqueeze(0)

    predictions = fmodel(params, buffers, batch) 
    loss = criterion(predictions, targets)
    return loss

ft_compute_grad = grad(compute_loss_stateless_model)
gradinet = ft_compute_grad(params, buffers, train_poi_set[0][0].cuda(), torch.tensor(train_poi_set[0][1]).cuda())

This will return the gradient of the whole model. However, I only want the second last layers' gradient, like:

gradinet = ft_compute_grad(params, buffers, train_poi_set[0][0].cuda(), torch.tensor(train_poi_set[0][1]).cuda())[-2]

Although this method can also obtain the required gradient, it will cause a lot of unnecessary overhead. Is there any way to close the 'require_grad' of all previous layers? Thanks for your answer!

The text was updated successfully, but these errors were encountered:

zou3519 · 2023-01-17T03:10:43Z

functorch.grad computes gradients w.r.t. to the first argument you pass it. This is currently params (all parameters in the model), but the solution is to pass it only the parameters that you want gradients of.

Some pseudocode.

from functorch import make_functional_with_buffers, vmap, grad
fmodel, params, buffers = make_functional_with_buffers(net,disable_autograd_tracking=True)

def compute_loss_stateless_model (last_layers_params, first_layers_params, buffers, sample, target):
    batch = sample.unsqueeze(0)
    targets = target.unsqueeze(0)

    # pseudocode: we need to put the params together back into a single params list
    # that fmodel can understand
    params = (*first_layers_params, *last_layers_params)

    predictions = fmodel(params, buffers, batch) 
    loss = criterion(predictions, targets)
    return loss

ft_compute_grad = grad(compute_loss_stateless_model)

# pseudocode: we need to split the params we want to compute gradients of from the params we don't
# want to compute gradients of.
first_layers_params, last_layers_params = partition(params)  

gradinet = ft_compute_grad(last_layers_params, first_layers_params, buffers, train_poi_set[0][0].cuda(), torch.tensor(train_poi_set[0][1]).cuda())

skxgogo · 2024-04-05T03:02:40Z

@zou3519 I have the similar question. But it's about jacrev. For example, I only want to compute the jacobi respect to the last layers. Can this work?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get only the last few layers' gradident? #1101

How to get only the last few layers' gradident? #1101

pmzzs commented Jan 13, 2023 •

edited

Loading

zou3519 commented Jan 17, 2023

skxgogo commented Apr 5, 2024

How to get only the last few layers' gradident? #1101

How to get only the last few layers' gradident? #1101

Comments

pmzzs commented Jan 13, 2023 • edited Loading

zou3519 commented Jan 17, 2023

skxgogo commented Apr 5, 2024

pmzzs commented Jan 13, 2023 •

edited

Loading