Implement DeepSeek V2 #2744

EricLBuehler · 2025-01-27T17:29:24Z

This PR implements the DeepSeek V2 architecture.

candle-examples/Cargo.toml

candle-examples/examples/deepseekv2/main.rs

candle-transformers/src/models/deepseek2.rs

LaurentMazare · 2025-01-31T19:20:40Z

candle-transformers/src/models/deepseek2.rs

+                // (n, topk_group)
+                let group_idx = scores.topk_unsorted(self.cfg.topk_group)?.indices;
+                // (n, n_group)
+                let mut group_mask = group_scores.zeros_like()?;


Cannot you just avoid this mut by chaining calls or using a local scope, seems fairly easy to do. Please also review the other remaining muts.

I've removed this mut. I removed all other mut occurrences in the model execution path (unless completely impossible), too.

guoqingbao · 2025-02-12T10:07:13Z

candle-transformers/src/models/deepseek2.rs

+        (q_pe, k_pe) = self.rotary_emb.forward(&q_pe, &k_pe, seqlen_offset)?;
+
+        let q = Tensor::cat(&[q_nope, q_pe], D::Minus1)?;
+        let mut k = Tensor::cat(&[k_nope, k_pe], D::Minus1)?;


Hi @EricLBuehler I got the following error for running DeepSeek-V2-Lite-Chat model:

called `Result::unwrap()` on an `Err` value: APIError { data: "shape mismatch in cat for dim 1, shape for arg 1: [1, 16, 5, 128] shape for arg 2: [1, 1, 5, 64]" } note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I found that you have repeated the second dim of Tensor k_pe to match Tensor k_nope and padded Tensor v with zeros in mistral.rs but not over here. Are there something specials for this case? I also tested the mistral.rs implementation, while, the lite model gives random outputs.

Hi @guoqingbao!

I found that you have repeated the second dim of Tensor k_pe to match Tensor k_nope and padded Tensor v with zeros in mistral.rs but not over here. Are there something specials for this case?

That case is only for the v head dim to match q/k as PagedAttention requires that (be sure to unpad too). We don't have PagedAttention in Candle (yet?), so this is not included.

I also tested the mistral.rs implementation, while, the lite model gives random outputs.

Just checked it there with DS V2 Lite on Metal in mistral.rs, and it works. Was that failing?

candle-transformers/src/models/deepseek2.rs

EricLBuehler · 2025-02-14T04:22:08Z

@LaurentMazare I updated the model with some fixes and removed all the mut occurrences, except for in the MoE block where it is necessary. If you could review again, that would be great!

I tested the model, and it is working:

cargo run --example deepseekv2 --release --features metal -- --prompt "Recursive fibonacci code in Rust:" --which lite --sample-len 150

EricLBuehler · 2025-02-19T16:08:52Z

Thank you! I'll update the DeepSeek V3 PR #2745.

I also have some MoE-specific optimizations which I'll be adding a PR for shortly!

Add deepseek v2

c3a9775

EricLBuehler mentioned this pull request Jan 27, 2025

Add support for Deepseek #2692

Open

Fix

cb47324

EricLBuehler marked this pull request as ready for review January 27, 2025 17:42

EricLBuehler added 2 commits January 27, 2025 13:08

Remove unused

a1dd5a1

Add kv cache

bd8b5c4

LaurentMazare reviewed Jan 28, 2025

View reviewed changes

EricLBuehler added 6 commits January 28, 2025 17:06

Remove from cargo.toml

66170a0

Fix dtype selection logic

5d42e6f

Fix unnecessary u32->f32->gather->u32

6df4f13

Remove fromstr impl

1ad7e92

Use local scopes for some clarity

b8974e9

Typo

ed0953e

EricLBuehler requested a review from LaurentMazare January 29, 2025 16:42

LaurentMazare reviewed Jan 31, 2025

View reviewed changes

guoqingbao reviewed Feb 12, 2025

View reviewed changes

guoqingbao reviewed Feb 13, 2025

View reviewed changes

candle-transformers/src/models/deepseek2.rs Outdated Show resolved Hide resolved

EricLBuehler added 5 commits February 13, 2025 22:46

Repeat k_pe

95864a0

Chain calls to remove mut

404d012

Actually, remove all muts

f0d466b

Merge branch 'dev_main' into add_deepseekv2

1a10d87

Update readme

67940de

LaurentMazare approved these changes Feb 19, 2025

View reviewed changes

LaurentMazare merged commit e6cc76f into huggingface:main Feb 19, 2025
10 checks passed

EricLBuehler deleted the add_deepseekv2 branch February 19, 2025 16:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement DeepSeek V2 #2744

Implement DeepSeek V2 #2744

EricLBuehler commented Jan 27, 2025

LaurentMazare Jan 31, 2025

EricLBuehler Feb 14, 2025

guoqingbao Feb 12, 2025

EricLBuehler Feb 14, 2025

EricLBuehler commented Feb 14, 2025

EricLBuehler commented Feb 19, 2025 •

edited

Loading

Implement DeepSeek V2 #2744

Implement DeepSeek V2 #2744

Conversation

EricLBuehler commented Jan 27, 2025

LaurentMazare Jan 31, 2025

Choose a reason for hiding this comment

EricLBuehler Feb 14, 2025

Choose a reason for hiding this comment

guoqingbao Feb 12, 2025

Choose a reason for hiding this comment

EricLBuehler Feb 14, 2025

Choose a reason for hiding this comment

EricLBuehler commented Feb 14, 2025

EricLBuehler commented Feb 19, 2025 • edited Loading

EricLBuehler commented Feb 19, 2025 •

edited

Loading