You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I'm trying to quantize at inference time a float32 model into float16. It looks like pytorch doesn't support this or am I missing some environment variable that I need to set to enable this? I'm using sockeye==3.1.27.
I also tried to use a float16 model (aka sockeye-quantize --model model/params.best --config model/args.yaml --dtype float16) and then sockeye-translate ... --use-cpu --dtype int8 and got the same error message.
If I tried to translate using a float32 model and --dtype int8, I get some translations.
My goal here is to save the model into a smaller file and use it to translate on CPUs.
Sockeye doesn't currently support FP16 inference on CPUs since PyTorch doesn't have CPU FP16 implementations of all the operators we use. For 16-bit CPU inference, you could try BF16: sockeye-translate --dtype bfloat16 ...
Thanks for the pointer. Turns out I was using sockeye-3.1.27 which didn't have that option. It initially failed using pytorch-1.11.0 but I was successful doing sockeye-translate --use-cpu --dtype bfloat16 when I used pytorch-1.13.1.
Hi,
I'm trying to quantize at inference time a float32 model into float16. It looks like pytorch doesn't support this or am I missing some environment variable that I need to set to enable this? I'm using
sockeye==3.1.27
.I also tried to use a float16 model (aka
sockeye-quantize --model model/params.best --config model/args.yaml --dtype float16
) and thensockeye-translate ... --use-cpu --dtype int8
and got the same error message.If I tried to translate using a float32 model and
--dtype int8
, I get some translations.My goal here is to save the model into a smaller file and use it to translate on CPUs.
Command
Conda Environment
conda env export
Error Message
The text was updated successfully, but these errors were encountered: