Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多卡性能比单卡慢 #1726

Open
HexToString opened this issue Dec 9, 2021 · 5 comments
Open

多卡性能比单卡慢 #1726

HexToString opened this issue Dec 9, 2021 · 5 comments
Assignees

Comments

@HexToString
Copy link

HexToString commented Dec 9, 2021

欢迎您反馈PaddleHub使用问题,非常感谢您对PaddleHub的贡献!
在留下您的问题时,辛苦您同步提供如下信息:

  • 版本、环境信息
    1)PaddleHub2.1,PaddlePaddle-GPU 2.2
    2)PaddleHub的docker环境

开启GPU,设置多卡后,比如100张图片,预测,总耗时比单卡还慢。但是我看咱们宣传的是性能会大幅度提升。是我没设置对么?
设置的参数是--use_gpu --gpu 0,1,2,3

@KPatr1ck KPatr1ck self-assigned this Dec 9, 2021
@HexToString
Copy link
Author

hub serving start -c deploy/hubserving/ocr_system/config.json

{
"modules_info": {
"ocr_system": {
"init_args": {
"version": "1.0.0",
"use_gpu": true
},
"predict_args": {
}
}
},
"port": 8868,
"use_multiprocess": false,
"workers": 10,
"gpu": "0,1,2,3"
}

@HexToString
Copy link
Author

其实还有一个问题:
为什么开启multi_process不能打开GPU?
那么既然打开GPU就不能并发,那我指定多卡,其实是不是也不是并发,而是串行的用不同的卡?
如果是这样,那也就解释通了,为什么开多卡比单卡还慢。

@HexToString
Copy link
Author

开单卡,30张图片,总耗时:72.8076696395874
开4卡,30张图片,总耗时:93.16342949867249

@KPatr1ck
Copy link
Contributor

deploy/hubserving/ocr_system/config.json

目前hub serving命令的背后支持三种形式:Flask、Gunicorn+Flask和ZMQ+Flask。
hub serving命令启动时依据use_multiprocessuse_gpu两个参数的组合来确定具体形式:

  1. use_gpu为True,use_multiprocess参数无效,使用ZMQ+Flask,gpu决定了单卡还是多卡预测,如果不设置gpu,默认启动第0号gpu。
  2. use_gpu为False,use_multiprocess为True,使用Gunicorn+Flask,cpu多进程预测。
  3. use_gpu为False,use_multiprocess为False,使用Flask,gpu单卡或者cpu单进程预测。

hub serving start -c deploy/hubserving/ocr_system/config.json

{ "modules_info": { "ocr_system": { "init_args": { "version": "1.0.0", "use_gpu": true }, "predict_args": { } } }, "port": 8868, "use_multiprocess": false, "workers": 10, "gpu": "0,1,2,3" }

回复中给出的配置如下,因为最外层没有设置use_gpu(默认False),use_multiprocess为False,因此走的是第3种形式,无论gpu的卡数设置多或少,都是gpu单卡/cpu单进程预测。

{
    "modules_info": {
        "ocr_system": {
            "init_args": {
                "version": "1.0.0",
                "use_gpu": true
            },
            "predict_args": {
            }
        }
    },
    "port": 8868,
    "use_multiprocess": false,
    "workers": 10,
    "gpu": "0,1,2,3"
}

如果需要多卡预测,需要更改为:

{
    "modules_info": {
        "ocr_system": {
            "init_args": {
                "version": "1.0.0",
                "use_gpu": true
            },
            "predict_args": {
            }
        }
    },
    "port": 8868,
    "use_multiprocess": false,
    "use_gpu": true,
    "gpu": "0,1,2,3"
}

根据https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.3/deploy/hubserving#%E5%8F%91%E9%80%81%E9%A2%84%E6%B5%8B%E8%AF%B7%E6%B1%82 的预测脚本,要测试多卡时需要启用多进程同时发送多个预测请求,改成10个并发请求后,运行预测测试,第一种配置耗时7.04秒,第二种配置耗时1.329秒,多卡serving可以正常使用。

@JeremyGe07
Copy link

deploy/hubserving/ocr_system/config.json

目前hub serving命令的背后支持三种形式:Flask、Gunicorn+Flask和ZMQ+Flask。 hub serving命令启动时依据use_multiprocessuse_gpu两个参数的组合来确定具体形式:

  1. use_gpu为True,use_multiprocess参数无效,使用ZMQ+Flask,gpu决定了单卡还是多卡预测,如果不设置gpu,默认启动第0号gpu。
  2. use_gpu为False,use_multiprocess为True,使用Gunicorn+Flask,cpu多进程预测。
  3. use_gpu为False,use_multiprocess为False,使用Flask,gpu单卡或者cpu单进程预测。

hub serving start -c deploy/hubserving/ocr_system/config.json
{ "modules_info": { "ocr_system": { "init_args": { "version": "1.0.0", "use_gpu": true }, "predict_args": { } } }, "port": 8868, "use_multiprocess": false, "workers": 10, "gpu": "0,1,2,3" }

回复中给出的配置如下,因为最外层没有设置use_gpu(默认False),use_multiprocess为False,因此走的是第3种形式,无论gpu的卡数设置多或少,都是gpu单卡/cpu单进程预测。

{
    "modules_info": {
        "ocr_system": {
            "init_args": {
                "version": "1.0.0",
                "use_gpu": true
            },
            "predict_args": {
            }
        }
    },
    "port": 8868,
    "use_multiprocess": false,
    "workers": 10,
    "gpu": "0,1,2,3"
}

如果需要多卡预测,需要更改为:

{
    "modules_info": {
        "ocr_system": {
            "init_args": {
                "version": "1.0.0",
                "use_gpu": true
            },
            "predict_args": {
            }
        }
    },
    "port": 8868,
    "use_multiprocess": false,
    "use_gpu": true,
    "gpu": "0,1,2,3"
}

根据https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.3/deploy/hubserving#%E5%8F%91%E9%80%81%E9%A2%84%E6%B5%8B%E8%AF%B7%E6%B1%82 的预测脚本,要测试多卡时需要启用多进程同时发送多个预测请求,改成10个并发请求后,运行预测测试,第一种配置耗时7.04秒,第二种配置耗时1.329秒,多卡serving可以正常使用。

请看一下我按照您说的设置遇到的情况,#2293

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants