Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to keep silent #52

Open
NewNoviceChen opened this issue Apr 15, 2024 · 3 comments
Open

How to keep silent #52

NewNoviceChen opened this issue Apr 15, 2024 · 3 comments

Comments

@NewNoviceChen
Copy link

During the synthesis of video, I found that if the original video was not in the silent state, there would be a problem of mouth mismatch, but in the silent state, the effect is much better. I would like to ask how to match the mouth shape in the original video speaking state, or directly generate a silent video (I found that the effect of Wav2Lip trying to silence is not good), thank you very much

@anothermartz
Copy link
Owner

I'm having some difficulty understanding exactly what you mean, if you don't natively speak English, could you use a large laguage model such as chatgpt/copilot/gemini to translate from your native language?

I believe you're talking about how the mouth movements in the original video are hard to mask/suppress. Wav2Lip works better than Wav2Lip_GAN but it is still pretty bad.

I have had an idea to try to take the mouth first frame of the mouth/chin and try to lock its pose while still tracking it onto the face, slowly transitioning to another frame using optical flow blending. But I really doubt this could look convincing at all and would be a lot of work to get it working in the first place.

There's another lipsynching project that seems to suppress the face much better here:

https://github.com/natlamir/DINet

I may look into making this into an Easy-DINet project, then perhaps combining the two into one GUI where you can choose between or even layer them together, using DINet to suppress the original movements and then Wav2Lip to apply an accurate lipsync, but this will take a long time.

@NewNoviceChen
Copy link
Author

Thank you for your response. My English isn't very good, but I feel like you've understood my meaning. May I ask another question? I'm considering training and generating a video using Dinet without mouth movement, then using that video for synthesis with Wav2lip. Would this approach be putting the cart before the horse?

@skeletonNN
Copy link

Can you solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants