Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The learning rate #4

Open
Selimonder opened this issue Jan 9, 2023 · 4 comments
Open

The learning rate #4

Selimonder opened this issue Jan 9, 2023 · 4 comments

Comments

@Selimonder
Copy link

Hello,

Thank you for presenting awesome ideas with your work and addressing fundamental issues in previous works.

In the Training Setup section of your paper the learning rate is mentioned as 2e-3 whereas your implementation usws 2e-4.

2e-4 sounds more reasonable (due to hifigan baseline). However, I couldn't achieve a balanced training using this value, which always ended up with slight metallic artifact.

I am 1M steps in with 2e-3 and it looks better - but I still have doubts around it.

Can you explain the discrepancy?

Thank you

@WhiteFu
Copy link

WhiteFu commented Jan 15, 2023

Hello,

Thank you for presenting awesome ideas with your work and addressing fundamental issues in previous works.

In the Training Setup section of your paper the learning rate is mentioned as 2e-3 whereas your implementation usws 2e-4.

2e-4 sounds more reasonable (due to hifigan baseline). However, I couldn't achieve a balanced training using this value, which always ended up with slight metallic artifact.

I am 1M steps in with 2e-3 and it looks better - but I still have doubts around it.

Can you explain the discrepancy?

Thank you

Hello, I have the same question.

Judging by the results of your experiment,How is the sound quality of avocodo compared with hifigan? Is there a suitable super parameter recommendation, such as learning_rate, 2e-3 or 2e-4?

I am looking forward to your reply

@Selimonder
Copy link
Author

Hello,

The learning rate of 2e-4 worked better at the end (the official implementation is also using this value). Perhaps there is a typo in the paper.

The default setup did not worked well for my training setup. The following additions helped:

  • Randomly skip discriminator optimization steps (10% chance)
  • Lower learning rate for the discriminator

The sound quality of avocodo sounds is overall good and theres a reduction on artefacts.

@DeepLatte
Copy link
Collaborator

DeepLatte commented Jan 26, 2023

Hello,

Thank you for presenting awesome ideas with your work and addressing fundamental issues in previous works.

In the Training Setup section of your paper the learning rate is mentioned as 2e-3 whereas your implementation usws 2e-4.

2e-4 sounds more reasonable (due to hifigan baseline). However, I couldn't achieve a balanced training using this value, which always ended up with slight metallic artifact.

I am 1M steps in with 2e-3 and it looks better - but I still have doubts around it.

Can you explain the discrepancy?

Thank you

Thank you for your interest in our paper.
As you said, there is a typo in the paper. The learning rate should have mentioned as
2e-4. Imaging artifacts could be remained at lower training step, but it's going to be suppressed as training goes on. I think generator concentrates lower frequency components first. After that, it starts to learn how to suppress such artifacts.

Looking at your solutions, it seems that there exist cases that the discriminator failed at training. In our case, some of the discriminators failed to learn when the size of dataset was very small. It makes artifacts occur in outputs. We tried to prevent the failure by adjusting the discriminator's parameters. The solutions you have suggested also seems to be a good solution to the problem. Thanks for the suggestion.

@WhiteFu
Copy link

WhiteFu commented Feb 1, 2023

Hello,

The learning rate of 2e-4 worked better at the end (the official implementation is also using this value). Perhaps there is a typo in the paper.

The default setup did not worked well for my training setup. The following additions helped:

  • Randomly skip discriminator optimization steps (10% chance)
  • Lower learning rate for the discriminator

The sound quality of avocodo sounds is overall good and theres a reduction on artefacts.

Thanks for your reply, I will try it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants