speculative : refactor and add a simpler example #10362

ggerganov · 2024-11-17T16:10:07Z

Refactor the speculative decoding into common/speculative. For now, just basic greedy speculation with a single sequence.
Add a more simple speculative-simple example that uses the new API.

This could be used as a starting point for llama-server speculative decoding support. See the TODOs in the comments for the next steps.

Sample usage:

./bin/llama-speculative-simple \
    -m  ../models/qwen2.5-32b-coder-instruct/ggml-model-q8_0.gguf \
    -md ../models/qwen2.5-0.5b-coder-instruct/ggml-model-q8_0.gguf \
    -f ../../test.txt -c 0 -ngl 99 -ngld 99 --draft 16 --color \
    --sampling-seq k --top-k 1 -fa

ggml-ci

speculative : refactor and add a simpler example

69982ea

ggml-ci

ggerganov added the demo Demonstrate some concept or idea, not intended to be merged label Nov 17, 2024

github-actions bot added the examples label Nov 17, 2024

ggerganov force-pushed the gg/speculative-refactor branch 3 times, most recently from 097dfd1 to 96b1c3b Compare November 17, 2024 17:19

speculative : clean-up and add comments and TODOs [no ci]

74221ef

ggerganov force-pushed the gg/speculative-refactor branch from 96b1c3b to 74221ef Compare November 17, 2024 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speculative : refactor and add a simpler example #10362

speculative : refactor and add a simpler example #10362

ggerganov commented Nov 17, 2024 •

edited

Loading

speculative : refactor and add a simpler example #10362

Are you sure you want to change the base?

speculative : refactor and add a simpler example #10362

Conversation

ggerganov commented Nov 17, 2024 • edited Loading

ggerganov commented Nov 17, 2024 •

edited

Loading