You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While I'm really thankful that you've open-sourced this amazing repo, I'm a bit confused about the way you are generating speech tokens from text tokens. It seems that the generate function which is defined in t2s_up_wds_mlang_enclm.py is feeding only one speech token to the forward method to generate the next speech token while the generate_batch which is defined in the same file (and it's not working) it uses all speech frames up to a specific time to generate the next token. Based on what I described I have two questions:
1- Why is generate function only using one speech token as input for the forward method?
2- How can I make the forward method work with all speech and text tokens fed to it? (currently it doesn't work this way and only accepts one speech token)
The reason I need item 2 is that I want to calculate the log probabilities for all speech tokens given text tokens.
Thanks in advance.
The text was updated successfully, but these errors were encountered:
While I'm really thankful that you've open-sourced this amazing repo, I'm a bit confused about the way you are generating speech tokens from text tokens. It seems that the generate function which is defined in t2s_up_wds_mlang_enclm.py is feeding only one speech token to the forward method to generate the next speech token while the generate_batch which is defined in the same file (and it's not working) it uses all speech frames up to a specific time to generate the next token. Based on what I described I have two questions:
1- Why is generate function only using one speech token as input for the forward method?
2- How can I make the forward method work with all speech and text tokens fed to it? (currently it doesn't work this way and only accepts one speech token)
The reason I need item 2 is that I want to calculate the log probabilities for all speech tokens given text tokens.
Thanks in advance.
The text was updated successfully, but these errors were encountered: