Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

melo/api.py: add a 'tts' iterator to greatly improve the response speed #88

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

suzakuwcx
Copy link

@suzakuwcx suzakuwcx commented Mar 29, 2024

I am going to make a simply assistant, it can play the sound generated by chatgpt. But most of the time chatgpt will return a bunch of text, and it will cost a lot of time to waiting for tts respone, that was annoying, for example:

在 Minecraft 中表现初唐时期的拱结构,你可以尝试以下几种方法:

    建筑风格:
        初唐时期的建筑通常是木质或砖石结构,采用了许多拱形结构,如门廊、走廊和门窗等。在 Minecraft 中,你可以使用这些材料来模拟这些建筑风格,并尽可能使用拱形的设计元素。

    拱门:
        在 Minecraft 中,你可以使用各种材料来建造拱门。尝试使用石砖、圆石或者砖块等材料,以模拟出初唐时期的拱门。你可以利用方块的不同摆放方式来创造不同形式的拱门,比如圆拱、方拱等。

    屋顶结构:
        初唐时期的建筑通常采用了拱形的屋顶结构,这些屋顶常常呈弧形或者圆形。在 Minecraft 中,你可以使用类似的方式建造拱形的屋顶,比如利用楼梯方块、台阶方块等来模拟出拱形的屋顶。

    内部结构:
        初唐时期的建筑内部也常常采用了拱形结构来支撑屋顶或者分隔空间。在 Minecraft 中,你可以在建筑的内部使用拱形的结构来模拟这一特点,比如建造拱形的天花板或者拱形的墙壁。

    装饰元素:
        初唐时期的建筑通常会使用各种装饰元素来装饰门窗、拱顶等地方。在 Minecraft 中,你可以使用各种材料来添加装饰元素,比如花纹砖块、石英块等,以模拟出初唐时期建筑的装饰风格。

通过结合这些方法,在 Minecraft 中你可以尽可能地模拟出初唐时期建筑的拱形结构和风格。

In 'tts_to_file' function, the preprocessing process will try to split a long sentences into texts array. Then using model to interfence with each sentences and combine the result into finally audio array. But if the sentences is very very long, wait the entire process to be finished will cost lots of time, it is not a great idea. Most of the time, the interfence speed will extremely faster than playing speed, so use a iterator to get each of the audio piece.

Code Before (waiting 1.3s to get respone):

audio = model.tts_to_file(text, speaker_ids['ZH'], speed=speed)

Code After (just waiting 0.2s to get respone):

# This API can still be use
audio = model.tts_to_file(text, speaker_ids['ZH'], speed=speed)
# or
for audio in model.tts_iter(x, speaker_ids['ZH'], speed=speed):
        play_audio(audio)

In 'tts_to_file' function, the preprocessing process will try to split
a long sentences into texts array. Then using model to interfence with
each sentences and combine the result into finally audio array. But if
the sentences is very very long, wait the entire process to be finished
will cost lots of time, it is not a great idea. Most of the time, the
interfence speed will extremely faster than playing speed, so use a
iterator to get each of the audio piece.
jwc20 added a commit to jwc20/MeloTTS that referenced this pull request Jul 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant