A question on square matrices #22

GiftedNovaHD · 2024-01-29T01:05:59Z

Hello there, I want to clarify whether the need for square matrices is strictly enforced. From the paper, I note that

"We turn to an expressive class of sub-quadratic structured matrices called Monarch matrices [12] (Figure 1 left) to propose Monarch Mixer (M2). Monarch matrices are a family of structured matrices that generalize the fast Fourier transform (FFT) and have been shown to capture a wide class of linear transforms including Hadamard transforms, Toeplitz matrices [30], AFDF matrices [55], and convolutions. They are parameterized as the products of block-diagonal matrices, called monarch factors, interleaved with permutation. Their compute scales sub-quadratically: setting the number of factors to p results in computational complexity of $O(pN^{(p+1)/p})$ in input length $N$ , allowing the complexity to interpolate between $O(N \log N )$ at $p = \log N$ and $O(N 3/2)$ at $p = 2$.", as well as:

"The convolution case with Monarch matrices fixed to DFT and inverse DFT matrices also admits implementations based on FFT algorithms [9]."

Furthermore, Proposition 3.2 in the original Monarch paper asserts that $\mathcal{MM}^*$ which can represent a convolution.

I thus want to find out whether the Monarch mixer operation enforces the requirements for having block-diagonal matrices, since (residual gated) convolution(s) intuitively does not usually output a block-diagonal matrix.

Thank You!

DanFu09 · 2024-02-26T20:11:04Z

For the FFTs, we use square matrices. For the MLP layers, we actually use rectangular matrices to match the reverse bottleneck of the MLP.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question on square matrices #22

A question on square matrices #22

GiftedNovaHD commented Jan 29, 2024

DanFu09 commented Feb 26, 2024

A question on square matrices #22

A question on square matrices #22

Comments

GiftedNovaHD commented Jan 29, 2024

DanFu09 commented Feb 26, 2024