Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate Log Probabilities for a Sequence of Speech Tokens Given Text Tokens #153

Open
pythinker opened this issue Sep 20, 2024 · 0 comments

Comments

@pythinker
Copy link

While I'm really thankful that you've open-sourced this amazing repo, I'm a bit confused about the way you are generating speech tokens from text tokens. It seems that the generate function which is defined in t2s_up_wds_mlang_enclm.py is feeding only one speech token to the forward method to generate the next speech token while the generate_batch which is defined in the same file (and it's not working) it uses all speech frames up to a specific time to generate the next token. Based on what I described I have two questions:

1- Why is generate function only using one speech token as input for the forward method?
2- How can I make the forward method work with all speech and text tokens fed to it? (currently it doesn't work this way and only accepts one speech token)

The reason I need item 2 is that I want to calculate the log probabilities for all speech tokens given text tokens.

Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant