Some problems with coarse transformer training #237

dwangF0 · 2023-10-11T06:45:36Z

dwangF0
Oct 11, 2023

For coarse generation model, how many training steps does it usually require to generate intelligible speech? In my case, I use LibriTTS for training, and batch size is set as 96. Given the oracle semantic token, at training step of 20,000, still the model cannot generate intelligible speech.

Some training config:

coarse_transformer = CoarseTransformer(
num_semantic_tokens = wav2vec.codebook_size,
codebook_size = 1024,
num_coarse_quantizers = 3,
dim = 512,
depth = 6
).cuda()

trainer = CoarseTransformerTrainer(
transformer = coarse_transformer,
codec = EncodecWrapper(),
wav2vec = wav2vec,
folder = dataset_folder,
batch_size = 96,
data_max_length = 160000,
num_train_steps = 20000
)

Could anyone share some insights about this if you are also working on this?

Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some problems with coarse transformer training #237

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Some problems with coarse transformer training #237

dwangF0 Oct 11, 2023

Replies: 0 comments

dwangF0
Oct 11, 2023