[DC-AE, SANA] fix SanaMultiscaleLinearAttention apply_quadratic_atten…

…tion bf16 (#10595) * autoencoder_dc tiling * add tiling and slicing support in SANA pipelines * create variables for padding length because the line becomes too long * add tiling and slicing support in pag SANA pipelines * revert changes to tile size * make style * add vae tiling test * fix SanaMultiscaleLinearAttention apply_quadratic_attention bf16 --------- Co-authored-by: Aryan <[email protected]>
huggingface · Jan 16, 2025 · b785ddb · b785ddb
1 parent e8114bd
commit b785ddb
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/src/diffusers/models/attention_processor.py b/src/diffusers/models/attention_processor.py
@@ -899,7 +899,7 @@ def apply_quadratic_attention(self, query: torch.Tensor, key: torch.Tensor, valu
         scores = torch.matmul(key.transpose(-1, -2), query)
         scores = scores.to(dtype=torch.float32)
         scores = scores / (torch.sum(scores, dim=2, keepdim=True) + self.eps)
-        hidden_states = torch.matmul(value, scores)
+        hidden_states = torch.matmul(value, scores.to(value.dtype))
         return hidden_states
 
     def forward(self, hidden_states: torch.Tensor) -> torch.Tensor: