DDP has slower training speed #124

jyuntins · 2024-05-28T15:28:08Z

Hi, I am trying to Finetune HMR2.0 on 3 RTX3090. It works fine when I only train it on 1 gpu.
When I set trainer.devices=3, I received the error:
ValueError: ctypes objects containing pointers cannot be pickled

I found a workaround to solve this is to use ddp strategy just as you did in the ddp trainer config file.
However, when I set the trainer to ddp, the speed drops from 2.5 it/s to 7s/it.

I wonder which config file is used when you trained hmr2.0? Is there anyway to accelerate the training speed in this configuration?

wtx9527 · 2024-08-02T14:08:33Z

Hi, I have the same problem, have you solved it?

jyuntins · 2024-08-02T22:53:29Z

Hi, @wtx9527
No, I didn't solve it.

wangsen1312 · 2024-09-17T03:38:29Z

I met the same problem when I use the multi GPU training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDP has slower training speed #124

DDP has slower training speed #124

jyuntins commented May 28, 2024

wtx9527 commented Aug 2, 2024

jyuntins commented Aug 2, 2024

wangsen1312 commented Sep 17, 2024

DDP has slower training speed #124

DDP has slower training speed #124

Comments

jyuntins commented May 28, 2024

wtx9527 commented Aug 2, 2024

jyuntins commented Aug 2, 2024

wangsen1312 commented Sep 17, 2024