Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

怎么启动这么慢 #3

Open
dizhenx opened this issue May 9, 2023 · 1 comment
Open

怎么启动这么慢 #3

dizhenx opened this issue May 9, 2023 · 1 comment

Comments

@dizhenx
Copy link

dizhenx commented May 9, 2023

运行 python3 -u chatglm_service_fastapi.py --host 127.0.0.1 --port 7861 --quantize 4 --device 1,2,3
好几次卡在 starting server... 这一步。卡10分钟也没结果

@TylunasLi
Copy link
Owner

TylunasLi commented May 12, 2023

根据您反馈的问题做了排查:

  1. 现有代码似乎没有直接加载THUDM/chatglm-6b-int4,而是加载原始chatglm-6b开始量化的,我会尽快提交,修复该问题。
  2. 由于直接使用了原始半精度模型量化,量化过程确实较慢,我在一台Xeon Gold 6133 + Nvidia A40的推理服务器上启动也需要大约10分钟;
  3. 您一次指定了3个设备,同时部署模型更会拖慢时间。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants