Replies: 1 comment 8 replies
-
I was able to answer some of the questions myself: On my machine any value for max samples higher than 8 leads to horribly slow training speed. So I guess going higher than that is not advisable. Trained for 320 epochs but unfortunately the resulting model doesn't perform any better than the base model in my opinion. No discernable difference in the generated audio files. |
Beta Was this translation helpful? Give feedback.
8 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've been getting great results with F5-TTS, but my first attempt at fine-tuning trained from scratch instead of using the pretrained model—the output started as noise and is only slowly becoming speech.
How do I correctly fine-tune instead of starting from scratch?
Do I need to set "Tokenizer File" and "Path to Pretrained Checkpoint" manually? If so, what should I put?
Does "Download corresponding dataset first, and fill in the path in scripts" (repo link) refer to this?
Project Details:
I'm working on generating voices for characters from an old game. I have 10 to 60 minutes of clean audio samples per character. Language: English.
Hardware:
GPU: Nvidia 4080 Laptop (12GB VRAM)
I'm looking for advice on the best values to set for finetuning, given my hardware. Here’s what I’ve gathered so far, but I’d love some expert input:
Parameter Questions
Batch Size per GPU: I assume 6400 should work with 12GB VRAM, but would appreciate confirmation.
Max Samples: Not sure, but I read that 2 might be fine (reference).
Gradient Accumulation Steps & Max Gradient Norm: No idea—should I just leave them at 1?
Epochs: How many would be reasonable for my dataset size?
Warmup Updates: Not sure what value is appropriate.
Save per Updates: I assume setting this high is better, as frequent saving would slow down training?
Last per Updates: Not sure what value to use here either.
Other Options:
Use 8-bit Adam optimizer – Should I enable this?
Mixed Precision – Any recommendations based on my GPU?
Logger – Not sure what’s best here.
Finetuning Duration
How long should I expect finetuning to take per character? Just so I can compare against my actual training times and check if my machine is underperforming due to driver or config issues.
Any guidance would be highly appreciated! 🚀
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions