New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

怎么启动这么慢 #3

Open

dizhenx opened this issue May 9, 2023 · 1 comment

dizhenx commented May 9, 2023

运行 python3 -u chatglm_service_fastapi.py --host 127.0.0.1 --port 7861 --quantize 4 --device 1,2,3
好几次卡在 starting server... 这一步。卡10分钟也没结果

Owner

TylunasLi commented May 12, 2023 •

edited

Loading

根据您反馈的问题做了排查：

现有代码似乎没有直接加载THUDM/chatglm-6b-int4，而是加载原始chatglm-6b开始量化的，我会尽快提交，修复该问题。
由于直接使用了原始半精度模型量化，量化过程确实较慢，我在一台Xeon Gold 6133 + Nvidia A40的推理服务器上启动也需要大约10分钟；
您一次指定了3个设备，同时部署模型更会拖慢时间。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment