-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
phone boundary between continuous vowels #5
Comments
Well, it would be a superise to me if NeuFA (or any other FA model) predicts some insane boundaries. Like the paper said, the 50 ms tolerance accuracy of NeuFA is 95% at word level. Also, NeuFA currently doesn't restrict the predicted boundaries to be nonoverlapping (we are working on this in NeuFA 2), So my opinion is NeuFA is not ready for production enviroments yet. Hope this will answer your question. |
@petronny Thank you for your reply!
|
I agree with that. We are working on the nonoverlapping issue.
See https://github.com/thuhcsi/NeuFA/blob/master/inference.py#L112 , I mainly uses the attention weights from the ASR direction. |
Get it! Thank you again! @petronny |
I tried w_tts and w_asr at phone level, but the results are both bad since the result of the first phone "silence" of each sentence has a big difference from the ground truth. I did not why。 Then I tried weight = boundary_left - boudary_right for each phone (the weight values are 1 in the middle of the phone, and about 0 in the border of the phone) and ues functions in "https://github.com/as-ideas/DeepForcedAligner/blob/main/dfa/duration_extraction.py" to extract durations. Then I can get a continues , no overlap alignment. |
Well , in fact I meet the similar problem with you. In my experiment, align is not even Monotony. Which means end time of a word is earlier than the start time of the word. And this made this great work suits for real scenario I think . Your idea may work I think , thank you |
The bad case is like this
|
@panxin801 May be you can use weight/attn = boundary_left - boundary_right for test. |
@Liujingxiu23 Yeah, I have the same conclusion with you, Chinese result better than English in average. And thank you for your advice |
@petronny Hi, I have trained the model using chinese dataset successfully.
But I meet a problem, the bounary of continuous vowels is not as correct as other phones. For example "我安心的点点头", phone boundarys between “我” 和 “安”,“o3” and "an1" ,are wrong. And this kind of problems happen frequently.
For syllables like "yun1"(云) I can split to "y vn1" where “y” has a certain duration value, for "wu2"(无) I can split to "w u2" where "w" has a duration value. But for some vowel, for example "安/an" "阿/a”, there is really no consonant at all.
Have you found problems like this, how did you solve the problem?
The text was updated successfully, but these errors were encountered: