Do multigpu training with weight sampler work with vits in this repo. #103

Marioando · 2024-10-12T10:16:26Z

Marioando
Oct 12, 2024

It was broken for years in the orginal coqui repo. Thanks

Oct 17, 2024

Hi, I have tested multigpu training and it does work if you dont use batch weighed sampler, and use accelerate set to true. I did not encounter any issue with ljspeech. But when I try to train on a larger dataset (libritts) I got nccl watchdog timout issues, I have tryed setting os.environ["NCCL_BLOCKING_WAIT"] = "1" but without success. How to disable timeout as precomputing the phoneme take almost 45 minutes.
P.S The formatter for libritts dont allow to continue training if there is missing audio in libritts. I have made some modification to it if you want I can PR.
Thank you!

View full answer

eginhard · 2024-10-13T20:28:49Z

eginhard
Oct 13, 2024
Maintainer

Can you test it and let me know? I haven't made any specific changes in that area, but happy to merge any fixes.

3 replies

Marioando Oct 14, 2024
Author

OK, I'll test it later. Thank you

Marioando Oct 17, 2024
Author

Hi, I have tested multigpu training and it does work if you dont use batch weighed sampler, and use accelerate set to true. I did not encounter any issue with ljspeech. But when I try to train on a larger dataset (libritts) I got nccl watchdog timout issues, I have tryed setting os.environ["NCCL_BLOCKING_WAIT"] = "1" but without success. How to disable timeout as precomputing the phoneme take almost 45 minutes.
P.S The formatter for libritts dont allow to continue training if there is missing audio in libritts. I have made some modification to it if you want I can PR.
Thank you!

Answer selected by Marioando

eginhard Oct 18, 2024
Maintainer

Thanks for testing! You can try this to increase the timeout. Otherwise you can also do the phoneme precomputation step separately on a CPU. It only needs to be done once and the outputs are then saved and reused.

Sure, PRs are always welcome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do multigpu training with weight sampler work with vits in this repo. #103

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Do multigpu training with weight sampler work with vits in this repo. #103

Marioando Oct 12, 2024

Replies: 1 comment · 3 replies

eginhard Oct 13, 2024 Maintainer

Marioando Oct 14, 2024 Author

Marioando Oct 17, 2024 Author

eginhard Oct 18, 2024 Maintainer

Marioando
Oct 12, 2024

Replies: 1 comment 3 replies

eginhard
Oct 13, 2024
Maintainer

Marioando Oct 14, 2024
Author

Marioando Oct 17, 2024
Author

eginhard Oct 18, 2024
Maintainer