Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft & Verify #2

Closed
Ryu1845 opened this issue Oct 8, 2023 · 13 comments
Closed

Draft & Verify #2

Ryu1845 opened this issue Oct 8, 2023 · 13 comments

Comments

@Ryu1845
Copy link

Ryu1845 commented Oct 8, 2023

Does this repository implement Draft & Verify?

@lucidrains
Copy link
Owner

lucidrains commented Oct 8, 2023

@Ryu1845 hey! thanks for sharing that paper!

that looks quite close if not better than the naive early exit strategy (they predict which layers to skip through some heuristic) - but using the same model for speculating / drafting is definitely what i was going for.

i think my prophet transformer idea should be the best though (although i'm biased and still haven't ran any head to head 😆)

@lucidrains
Copy link
Owner

@Ryu1845 really think we are going to see a resurgence in adaptive computation research over the next year, like actually made practical

@Ryu1845
Copy link
Author

Ryu1845 commented Oct 8, 2023

I think so too, thanks again for your work.
it looks like the official code for the paper will be uploaded here, but I'll keep an eye on this repo too 😉

@Ryu1845 Ryu1845 closed this as completed Oct 8, 2023
@lucidrains
Copy link
Owner

lucidrains commented Oct 8, 2023

@Ryu1845 sounds good!

yea i think the main idea from the prophet idea is to take advantage of the cached last layer embedding from the large model, which should be superior to any early exit stuff. if you find me another paper that did that, would definitely read and implement

i'm also using a transformer on top, borrowing working ideas from hierarchical transformer line of research

@Ryu1845
Copy link
Author

Ryu1845 commented Oct 8, 2023

yea i think the main idea from the prophet idea is to take advantage of the cached last layer embedding from the large model, which should be superior to any early exit stuff.

I don't know of any paper that does this but the medusa project aims to do just that I think.
https://together.ai/blog/medusa
https://github.com/FasterDecoding/Medusa

@lucidrains
Copy link
Owner

@Ryu1845 ohh yes, they totally did. so the only difference is i use a small transformer as the medusa / prophet heads

ok let me cite them as well

@lucidrains
Copy link
Owner

lucidrains commented Oct 8, 2023

@Ryu1845 oh haha, they don't have a paper, just a github repo. may be the new trend

@Ryu1845
Copy link
Author

Ryu1845 commented Oct 8, 2023

I'm guessing they'll release a paper once they've got a working prototype 😄
It looks like it's still a WIP FasterDecoding/Medusa#3
I actually don't know if it's running yet :/

@lucidrains
Copy link
Owner

lucidrains commented Oct 8, 2023

@Ryu1845 ohh, so it isn't functional yet? maybe i'll send their group a message. solving batched spec decoding is a bit tricky with kv cache, but i found a solution (not sure if optimal)

@lucidrains
Copy link
Owner

~~I'm guessing they'll release a paper once they've got a working prototype 😄 ~~ It looks like it's still a WIP FasterDecoding/Medusa#3 I actually don't know if it's running yet :/

so it works or doesn't work?

@Ryu1845
Copy link
Author

Ryu1845 commented Oct 8, 2023

it looks like it works, I'm sorry for the misunderstanding on my side

@lucidrains
Copy link
Owner

nice! that's amazing, i believe in that approach

@jmamou
Copy link

jmamou commented Nov 8, 2023

@lucidrains
Amazing work!
Do you plan to release your results with early exit?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants