Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question on square matrices #22

Open
GiftedNovaHD opened this issue Jan 29, 2024 · 1 comment
Open

A question on square matrices #22

GiftedNovaHD opened this issue Jan 29, 2024 · 1 comment

Comments

@GiftedNovaHD
Copy link

Hello there, I want to clarify whether the need for square matrices is strictly enforced. From the paper, I note that

"We turn to an expressive class of sub-quadratic structured matrices called Monarch matrices [12] (Figure 1 left) to propose Monarch Mixer (M2). Monarch matrices are a family of structured matrices that generalize the fast Fourier transform (FFT) and have been shown to capture a wide class of linear transforms including Hadamard transforms, Toeplitz matrices [30], AFDF matrices [55], and convolutions. They are parameterized as the products of block-diagonal matrices, called monarch factors, interleaved with permutation. Their compute scales sub-quadratically: setting the number of factors to p results in computational complexity of $O(pN^{(p+1)/p})$ in input length $N$ , allowing the complexity to interpolate between $O(N \log N )$ at $p = \log N$ and $O(N 3/2)$ at $p = 2$.", as well as:

"The convolution case with Monarch matrices fixed to DFT and inverse DFT matrices also admits implementations based on FFT algorithms [9]."

Furthermore, Proposition 3.2 in the original Monarch paper asserts that $\mathcal{MM}^*$ which can represent a convolution.

I thus want to find out whether the Monarch mixer operation enforces the requirements for having block-diagonal matrices, since (residual gated) convolution(s) intuitively does not usually output a block-diagonal matrix.

Thank You!

@DanFu09
Copy link
Collaborator

DanFu09 commented Feb 26, 2024

For the FFTs, we use square matrices. For the MLP layers, we actually use rectangular matrices to match the reverse bottleneck of the MLP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants