Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can i use MonarchMixer replace cross attention lay #9

Open
autumn-2-net opened this issue Nov 4, 2023 · 4 comments
Open

can i use MonarchMixer replace cross attention lay #9

autumn-2-net opened this issue Nov 4, 2023 · 4 comments

Comments

@autumn-2-net
Copy link

The Sequence Mixer in the paper doesn't seem to be able to mix unequal lengths of sequences in the same way as corss attention.because it uses elementwise multiplication.Is this a misunderstanding on my part or is Monarch Mixer not a replacement for cross attention?

@DanFu09
Copy link
Collaborator

DanFu09 commented Nov 4, 2023

This is something we're very interested in and still working on! We don't have a formula for it quite yet.

@autumn-2-net
Copy link
Author

This is something we're very interested in and still working on! We don't have a formula for it quite yet.

This doesn't sound like good news, it looks like I'll just have to CROSS ATTENTION mix MonarchMixer, is there a performance loss compared to raw ATTENTION?

@DanFu09
Copy link
Collaborator

DanFu09 commented Nov 4, 2023

We've seen that we can match self-attention in quality with some gated convolutions (see the paper for details). Cross attention is still an open problem - which we'll be working on!

@autumn-2-net
Copy link
Author

We've seen that we can match self-attention in quality with some gated convolutions (see the paper for details). Cross attention is still an open problem - which we'll be working on!

If I use M2 can I not use positional coding as I feel that M2 looks a bit similar to conv which allows the model to know the positional information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants