Calculate Log Probabilities for a Sequence of Speech Tokens Given Text Tokens #153

pythinker · 2024-09-20T01:19:08Z

While I'm really thankful that you've open-sourced this amazing repo, I'm a bit confused about the way you are generating speech tokens from text tokens. It seems that the generate function which is defined in t2s_up_wds_mlang_enclm.py is feeding only one speech token to the forward method to generate the next speech token while the generate_batch which is defined in the same file (and it's not working) it uses all speech frames up to a specific time to generate the next token. Based on what I described I have two questions:

1- Why is generate function only using one speech token as input for the forward method?
2- How can I make the forward method work with all speech and text tokens fed to it? (currently it doesn't work this way and only accepts one speech token)

The reason I need item 2 is that I want to calculate the log probabilities for all speech tokens given text tokens.

Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate Log Probabilities for a Sequence of Speech Tokens Given Text Tokens #153

Calculate Log Probabilities for a Sequence of Speech Tokens Given Text Tokens #153

pythinker commented Sep 20, 2024

Calculate Log Probabilities for a Sequence of Speech Tokens Given Text Tokens #153

Calculate Log Probabilities for a Sequence of Speech Tokens Given Text Tokens #153

Comments

pythinker commented Sep 20, 2024