Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speculative : refactor and add a simpler example #10362

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Nov 17, 2024

cont #10290

  • Refactor the speculative decoding into common/speculative. For now, just basic greedy speculation with a single sequence.
  • Add a more simple speculative-simple example that uses the new API.

This could be used as a starting point for llama-server speculative decoding support. See the TODOs in the comments for the next steps.

Sample usage:

./bin/llama-speculative-simple \
    -m  ../models/qwen2.5-32b-coder-instruct/ggml-model-q8_0.gguf \
    -md ../models/qwen2.5-0.5b-coder-instruct/ggml-model-q8_0.gguf \
    -f ../../test.txt -c 0 -ngl 99 -ngld 99 --draft 16 --color \
    --sampling-seq k --top-k 1 -fa

@ggerganov ggerganov added the demo Demonstrate some concept or idea, not intended to be merged label Nov 17, 2024
@ggerganov ggerganov force-pushed the gg/speculative-refactor branch 3 times, most recently from 097dfd1 to 96b1c3b Compare November 17, 2024 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demo Demonstrate some concept or idea, not intended to be merged examples
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant