Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding SimPLE head #32

Open
ElisonSherton opened this issue Mar 21, 2024 · 2 comments
Open

Understanding SimPLE head #32

ElisonSherton opened this issue Mar 21, 2024 · 2 comments

Comments

@ElisonSherton
Copy link

ElisonSherton commented Mar 21, 2024

Hi guys,

Kindly note that I have a single GPU and the below questions are asked based on the same i.e. my training is with a single GPU and not distributed.

I was trying to go through the implementation of SimPLE head in opensphere/module/head/simple.py .


In the forward implementation we can see the following

mask_p[:, pt:pt+x.size(0)].fill_diagonal_(False)

I think the intuition behind this filling is that we don't want to consider pair (i, i) as a positive pair, hence for the current batch, we will zero out those positions in the positive mask. But the current batch is not necessarily placed between 0:batch_size in the bank (Since I am using a single GPU self.rank is always 0); it will be placed at self.ptr:self.ptr + batch_size, shouldn't the operation given above zero out the diagonals starting from self.ptr instead of starting from 0? For me self.rank


For iresnet100, the wrap mode is polarlike which means that for the f_wrapping function, softplus_wrapping will be applied. On investigating this particular function, I observed that the output dimensionalities of the vectors are altered.

def softplus_wrapping(raw_feats):
    mags = F.softplus(raw_feats[..., :1], beta=1)
    feats = mags * F.normalize(raw_feats[..., 1:], dim=-1)
    return feats

The softplus activation is applied on the 0th index of the embedding and the remainder of the embedding i.e. from 1:embed_dim is normalized and scaled with the softplus activation of the 0th index. Why is this operation performed? Also why is the embedding dimension changed?

On doing a forward pass with embed_dimension = 64 and batch_size of 64, I obtain the following shapes during these particular steps:

Before F_Wrapping X shape: torch.Size([64, 64])
After F_Wrapping  X shape: torch.Size([64, 63])
After F_Wrapping  X_bank shape: torch.Size([8192, 63])
After F_Fusing X shape: torch.Size([64, 63])
After F_Fusing X_bank shape: torch.Size([8192, 63])
After F_Scoring Logits shape: torch.Size([64, 8192])
Positive Logits: torch.Size([256])
Negative Logits: torch.Size([524032])

Can you please explain what is going on in this F_wrapping and what it's purpose is? I could not find it in the paper too...

Thanks,
Vinayak.

@ydwen
Copy link
Owner

ydwen commented Mar 21, 2024

Hi Vinayak,

For the question about mask_p, you can think of the code mask_p[:, pt:pt+x.size(0)].fill_diagonal_(False) as two operations.

  1. temp = mask_p[:, pt:pt+x.size(0)] # take a sub-mask from the pt-th column to pt+x.size(0)-th column.
  2. temp.fill_diagonal_(False) # zero-out the diagonal, meaning the zeroing-out is starting from 0 in the sub-mask (temp), or starting from pt in the mask.

For the second question, F_wrapping is a parameterization method for the learned features (vectors). We can use different ways to the parameterization and the released code shows two.

In SimPLE paper, we use softplus(x[:, 0]) as the magnitude and normalize(x[:, 1:]) as the direction, simply because of the more stable convergence and better performance. This is just our empirical observation. We hypothesize that this parameterization may facilitate the training process, not but quite sure.

@chuong98
Copy link

chuong98 commented Apr 23, 2024

Hi @ydwen
I am trying to understand your implementation of SimPLE loss.

  1. In the paper, Equ 6,
    image
    but in your implementation, you put b_theta=0.3 into the score function,
    image
    image
    and let the self.bias is learning parameter.

Is this a bug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants