Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DeepSeek V2 #2744

Merged
merged 15 commits into from
Feb 19, 2025
Merged

Conversation

EricLBuehler
Copy link
Member

This PR implements the DeepSeek V2 architecture.

@EricLBuehler EricLBuehler marked this pull request as ready for review January 27, 2025 17:42
// (n, topk_group)
let group_idx = scores.topk_unsorted(self.cfg.topk_group)?.indices;
// (n, n_group)
let mut group_mask = group_scores.zeros_like()?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot you just avoid this mut by chaining calls or using a local scope, seems fairly easy to do. Please also review the other remaining muts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this mut. I removed all other mut occurrences in the model execution path (unless completely impossible), too.

(q_pe, k_pe) = self.rotary_emb.forward(&q_pe, &k_pe, seqlen_offset)?;

let q = Tensor::cat(&[q_nope, q_pe], D::Minus1)?;
let mut k = Tensor::cat(&[k_nope, k_pe], D::Minus1)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @EricLBuehler I got the following error for running DeepSeek-V2-Lite-Chat model:

called `Result::unwrap()` on an `Err` value: APIError { data: "shape mismatch in cat for dim 1, shape for arg 1: [1, 16, 5, 128] shape for arg 2: [1, 1, 5, 64]" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I found that you have repeated the second dim of Tensor k_pe to match Tensor k_nope and padded Tensor v with zeros in mistral.rs but not over here. Are there something specials for this case? I also tested the mistral.rs implementation, while, the lite model gives random outputs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @guoqingbao!

I found that you have repeated the second dim of Tensor k_pe to match Tensor k_nope and padded Tensor v with zeros in mistral.rs but not over here. Are there something specials for this case?

That case is only for the v head dim to match q/k as PagedAttention requires that (be sure to unpad too). We don't have PagedAttention in Candle (yet?), so this is not included.

I also tested the mistral.rs implementation, while, the lite model gives random outputs.

Just checked it there with DS V2 Lite on Metal in mistral.rs, and it works. Was that failing?

@EricLBuehler
Copy link
Member Author

@LaurentMazare I updated the model with some fixes and removed all the mut occurrences, except for in the MoE block where it is necessary. If you could review again, that would be great!

I tested the model, and it is working:

cargo run --example deepseekv2 --release --features metal -- --prompt "Recursive fibonacci code in Rust:" --which lite --sample-len 150   

@LaurentMazare LaurentMazare merged commit e6cc76f into huggingface:main Feb 19, 2025
10 checks passed
@EricLBuehler EricLBuehler deleted the add_deepseekv2 branch February 19, 2025 16:02
@EricLBuehler
Copy link
Member Author

EricLBuehler commented Feb 19, 2025

Thank you! I'll update the DeepSeek V3 PR #2745.

I also have some MoE-specific optimizations which I'll be adding a PR for shortly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants