-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft & Verify #2
Comments
@Ryu1845 hey! thanks for sharing that paper! that looks quite close if not better than the naive early exit strategy (they predict which layers to skip through some heuristic) - but using the same model for speculating / drafting is definitely what i was going for. i think my prophet transformer idea should be the best though (although i'm biased and still haven't ran any head to head 😆) |
@Ryu1845 really think we are going to see a resurgence in adaptive computation research over the next year, like actually made practical |
I think so too, thanks again for your work. |
@Ryu1845 sounds good! yea i think the main idea from the prophet idea is to take advantage of the cached last layer embedding from the large model, which should be superior to any early exit stuff. if you find me another paper that did that, would definitely read and implement i'm also using a transformer on top, borrowing working ideas from hierarchical transformer line of research |
I don't know of any paper that does this but the medusa project aims to do just that I think. |
@Ryu1845 ohh yes, they totally did. so the only difference is i use a small transformer as the medusa / prophet heads ok let me cite them as well |
@Ryu1845 oh haha, they don't have a paper, just a github repo. may be the new trend |
|
@Ryu1845 ohh, so it isn't functional yet? maybe i'll send their group a message. solving batched spec decoding is a bit tricky with kv cache, but i found a solution (not sure if optimal) |
so it works or doesn't work? |
it looks like it works, I'm sorry for the misunderstanding on my side |
nice! that's amazing, i believe in that approach |
@lucidrains |
Does this repository implement Draft & Verify?
The text was updated successfully, but these errors were encountered: