How to visualise the attention weight of the inputs #106

oym1994 · 2024-05-31T11:28:37Z

Hello，

Thanks for your great job. We want know more explanation of the output and so how can we visualise the attention weight of the inputs（including image and language）

Thanks for your attention and keep waiting for your kind response！

dibyaghosh · 2024-05-31T19:57:02Z

The general way to do this is to log the intermediates of the network computations, and then recompute the attention mask. Here's a brief sketch of how you would do this:

@jax.jit
def get_attention_mask(model, observation, task):
  _, intermediates = model.module.apply(
      {'params': model.params},
     observation, 
     task, 
     observation['timestep_pad_mask'],
     train=False,
     method="octo_transformer",
     mutable=['intermediates'],
     capture_intermediates=True
  )
  # Intermediates holds literally the output of every submodule run in the NN
  # As an example, let's get out the last Transformer MHA
  outs = intermediates['intermediates']['octo_transformer']['BlockTransformer_0']['Transformer_0']['encoderblock_11']['MultiHeadDotProductAttention_0']
  key = outs['key']['__call__']
  query = outs['query']['__call__']
  attention_weights =  nn.dot_product_attention_weights(query, key)
  # get the attention weights corresponding to the readout token
  return attention_weights[..., -1, :] # Shape (batch_size, # attention heads, # tokens)

Some notes:

Always run capture_intermediates=True inside a jax.jit (so that it never actually materializes the output of every submodule, only the one that it has to return from the jax.jit)
This should be easily extendible to whatever else you might want to monitor / log
You have to do a little handwork to figure out which tokens corresponds to images / languages / etc, but shouldn't be terrible

oym1994 · 2024-06-01T02:54:55Z

Get it! Thanks for your kind response! And I will try with this right now.

jsll · 2024-09-11T12:03:19Z

I am trying to do the same thing you do. @oym1994, did you manage to solve the problem? One thing I have a problem with is figuring out which tokens correspond to images/languages / etc. If you solved the problem, could you share the code?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to visualise the attention weight of the inputs #106

How to visualise the attention weight of the inputs #106

oym1994 commented May 31, 2024

dibyaghosh commented May 31, 2024

oym1994 commented Jun 1, 2024

jsll commented Sep 11, 2024

How to visualise the attention weight of the inputs #106

How to visualise the attention weight of the inputs #106

Comments

oym1994 commented May 31, 2024

dibyaghosh commented May 31, 2024

oym1994 commented Jun 1, 2024

jsll commented Sep 11, 2024