causal-linear do not use attn_mask ? #105

davidliujiafeng · 2021-10-14T07:43:12Z

I checked the source code inside the source python file causal_linear_attention.py.
I do not understand why is 'attn_mask' not using? Any Hints?
Thanks very much

def forward(self, queries, keys, values, attn_mask, query_lengths,
                key_lengths):
        # Apply the feature map to the queries and keys
        self.feature_map.new_feature_map(queries.device)
        Q = self.feature_map.forward_queries(queries)
        K = self.feature_map.forward_keys(keys)

        # Apply the key padding mask and make sure the attn_mask is a
        # lower triangular causal mask
        if not attn_mask.lower_triangular:
            raise RuntimeError(("CausalLinearAttention only supports full "
                                "lower triangular masks"))
        K = K * key_lengths.float_matrix[:, :, None, None]

        # Ensure that Q and K have compatible sizes for the following
        # computations, namely L == S
        Q, K = self._make_sizes_compatible(Q, K)

        # TODO: Shall we divide the Q and K with a relatively large number to
        #       avoid numerical instabilities in computing the denominator?
        #       We used to divide each with the max norm of all q and k but
        #       that seems relatively costly for a simple normalization.

        # Compute the normalizers
        Z = 1/(torch.einsum("nlhi,nlhi->nlh", Q, K.cumsum(1)) + self.eps)

        # Compute the unnormalized result
        V = causal_linear(
            Q,
            K,
            values
        )

        return V * Z[:, :, :, None]

Even-ok · 2024-01-14T04:38:51Z

Maybe it implements the CUDA version of the attention scores computation in the causal_product_cuda.cu, only by controlling the loop access bounds and not explicitly setting the mask. : )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

causal-linear do not use attn_mask ? #105

causal-linear do not use attn_mask ? #105

davidliujiafeng commented Oct 14, 2021

Even-ok commented Jan 14, 2024

causal-linear do not use attn_mask ? #105

causal-linear do not use attn_mask ? #105

Comments

davidliujiafeng commented Oct 14, 2021

Even-ok commented Jan 14, 2024