Why is there such a big difference in cosine similarity between embeddings of the same pair when using padding=max_length versus padding=true? #19

qianyue76 · 2024-01-21T14:01:03Z

When I was embedding a relevant text pair using the m2-bert-80M-32k-retrieval model, the cosine similarity obtained with padding=max_length was 0.7, while with padding=true (to save memory) it was close to 0. This resulted in semantic retrieval being completely impossible with padding=true. The same situation occurred with the 2k and 8k models as well.Why is this the case? And is padding=true completely unusable?

DanFu09 · 2024-01-21T14:48:18Z

The bidirectional convolutions in these models use the padding tokens to pass information from layer to layer (like scratch tokens). Padding = true sets the padding to be the length of the longest element in the batch, max length sets it to the max length of the tokenizer. We’re working on a version that gracefully interpolates between the 32k/8k/2k versions to save compute but it’s still active research so may not be live for a while.

…

On Sun, Jan 21, 2024 at 9:01 AM qianyue76 ***@***.***> wrote: When I was embedding a relevant text pair using the m2-bert-80M-32k-retrieval model, the cosine similarity obtained with padding=max_length was 0.7, while with padding=true (to save memory) it was close to 0. This resulted in semantic retrieval being completely impossible with padding=true. The same situation occurred with the 2k and 8k models as well.Why is this the case? And is padding=true completely unusable? — Reply to this email directly, view it on GitHub <#19>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABDDIIRDZ5WDR7Y3S45VTILYPUNSVAVCNFSM6AAAAABCEAFNIKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4TENJZHA3TQMI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

qianyue76 · 2024-01-23T11:17:10Z

I don't know why the padding to max_length with just adding (token_id) 0s make such a big difference to the embedding performance?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is there such a big difference in cosine similarity between embeddings of the same pair when using padding=max_length versus padding=true? #19

Why is there such a big difference in cosine similarity between embeddings of the same pair when using padding=max_length versus padding=true? #19

qianyue76 commented Jan 21, 2024

DanFu09 commented Jan 21, 2024 via email

qianyue76 commented Jan 23, 2024

Why is there such a big difference in cosine similarity between embeddings of the same pair when using padding=max_length versus padding=true? #19

Why is there such a big difference in cosine similarity between embeddings of the same pair when using padding=max_length versus padding=true? #19

Comments

qianyue76 commented Jan 21, 2024

DanFu09 commented Jan 21, 2024 via email

qianyue76 commented Jan 23, 2024