[BUG?] Multi-card training is slower than single-card training #6

Pevernow · 2025-01-11T11:15:39Z

Keeping other parameters unchanged, I tested with 1 card and 4 cards respectively, and the output progress bar showed 1.32s/it and 2.41s/it respectively.
In other words, using multiple cards even slowed down the training speed by half?

bs 1, gradient_accumulation_steps 1

frutiemax92 · 2025-01-11T15:46:27Z

There is a cost of using multiple gpu cards as it needs to sync batches together to form a bigger batch. So there is a speed-up of using multiple cards, but using twice the cards doesn't mean twice the speed. However, using multiple cards increase the effective batch size which results in better generalization and more stable training in theory.

There is also the possibility of splitting the dataset equally across gpus and not doing a "batch sync" between the gpus which could speed up training, but at the cost of reducing the effective batch size. I could put that option available later on...

Pevernow · 2025-01-11T16:57:24Z

@frutiemax92 I have considered that a four fold graphics card may only achieve twice the speed, but I never thought that a four fold card could only reach half the speed of a single card. This is slower than single card training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG?] Multi-card training is slower than single-card training #6

[BUG?] Multi-card training is slower than single-card training #6

Pevernow commented Jan 11, 2025

frutiemax92 commented Jan 11, 2025 •

edited

Loading

Pevernow commented Jan 11, 2025

[BUG?] Multi-card training is slower than single-card training #6

[BUG?] Multi-card training is slower than single-card training #6

Comments

Pevernow commented Jan 11, 2025

frutiemax92 commented Jan 11, 2025 • edited Loading

Pevernow commented Jan 11, 2025

frutiemax92 commented Jan 11, 2025 •

edited

Loading