Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG?] Multi-card training is slower than single-card training #6

Open
Pevernow opened this issue Jan 11, 2025 · 2 comments
Open

[BUG?] Multi-card training is slower than single-card training #6

Pevernow opened this issue Jan 11, 2025 · 2 comments

Comments

@Pevernow
Copy link

Keeping other parameters unchanged, I tested with 1 card and 4 cards respectively, and the output progress bar showed 1.32s/it and 2.41s/it respectively.
In other words, using multiple cards even slowed down the training speed by half?

bs 1, gradient_accumulation_steps 1

@frutiemax92
Copy link
Owner

frutiemax92 commented Jan 11, 2025

There is a cost of using multiple gpu cards as it needs to sync batches together to form a bigger batch. So there is a speed-up of using multiple cards, but using twice the cards doesn't mean twice the speed. However, using multiple cards increase the effective batch size which results in better generalization and more stable training in theory.

There is also the possibility of splitting the dataset equally across gpus and not doing a "batch sync" between the gpus which could speed up training, but at the cost of reducing the effective batch size. I could put that option available later on...

@Pevernow
Copy link
Author

@frutiemax92 I have considered that a four fold graphics card may only achieve twice the speed, but I never thought that a four fold card could only reach half the speed of a single card. This is slower than single card training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants