EXL2 low bpw draft model #77

SinanAkkoyun · 2023-10-01T01:04:40Z

SinanAkkoyun
Oct 1, 2023

Hey! I was wondering how one could skip training the draft model for speculative sampling alltogether by doing aggressively low bpw quantization?

I was also wondering (but that might be difficult to do) if one could theoretically look at forward pass "through-network" activations for a given dataset and disable those paths by setting zeros (skipping multiplications, somewhat like having lower parameter count)? I don't fully understand your quantization method, so with "akin to a sparse network" you probably already mean what I am asking but still, I want to know if it would be possible to "quantize a 34b model so hard that it has the latency of tinyllama".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EXL2 low bpw draft model #77

{{title}}

Replies: 0 comments

Select a reply

EXL2 low bpw draft model #77

SinanAkkoyun Oct 1, 2023

Replies: 0 comments

SinanAkkoyun
Oct 1, 2023