Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mels_mode generation #36

Open
Biyani404198 opened this issue Mar 4, 2024 · 1 comment
Open

mels_mode generation #36

Biyani404198 opened this issue Mar 4, 2024 · 1 comment

Comments

@Biyani404198
Copy link

Hi,
I have created TextGrid files in the subfolder textgrids using MFA.
Im facing issues to get average voice mel-spectrograms in the subfolder mels_mode.
Im using get_avg_mels.ipynb jupyter noteboook to get average voice mel-spectrograms.
Its generating mels_mode dictionary with phonemes as keys. But there is not further instructions to map them with spakers and create mels_mode subfolder using this dictionary.
@ivanvovk @ytyeung @wenyong-h @huawei-noah-admin @zhangjiajin2 Pls help.

for p in phoneme_list: mels_mode[p] = mode(np.asarray(mels_mode_dict[p]), 0).mode[0] lens[p] = np.mean(np.asarray(lens_dict[p]))

@li1jkdaw
Copy link

Basically, for each audio file .wav you know which frame corresponds to which phoneme (you can extract this information from textgrid file by calculating start_frame and end_frame as in get_avg_mels.ipynb), and then for each frame replace mel feature in _mel.npy file with the average feature of the corresponding phoneme -- mels_mode dictionary contains mapping {phoneme: its average mel feature}.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants