Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

funasr有没有类似于faster_whisper的whisperModel().transcribe的功能,可以获取segments? #2198

Open
liuchangzong opened this issue Nov 8, 2024 · 1 comment
Labels
question Further information is requested

Comments

@liuchangzong
Copy link

liuchangzong commented Nov 8, 2024

我有这样一个需求,我想要找到某一句话对应的语音在视频中的位置,然后用剪辑工具将这句话的视频剪辑出来,也就是说,我需要知道语音在视频中的起始和终止位置。

在faster_whisper中有这样一个功能:

from faster_whisper import WhisperModel, BatchedInferencePipeline

model = WhisperModel("medium", device="cuda", compute_type="float16")

batched_model = BatchedInferencePipeline(model=model)

segments, info = batched_model.transcribe("audio.mp3", batch_size=16)

for segment in segments:

    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

可以输出语音和语音对应的时间位置。这样我就可以找到这段语音对应的视频,并将视频截取出来。

那么在funasr中也有这样的功能嘛?

在funasr中也提供了这样的模型,但是我尝试了一下,

from funasr import AutoModel

model = AutoModel(model="fsmn-vad", model_revision="v2.0.4")

wav_file = f"{model.model_path}/example/asr_example.wav"
res = model.generate(input=wav_file)
print(res)

但我发现vad模型输出的时间分段每一段都太长了,不是一句话,而是近乎一个段落。

而如果我用:

from funasr import AutoModel
model = AutoModel(model="paraformer-zh", model_revision="v2.0.4",
                  vad_model="fsmn-vad", vad_model_revision="v2.0.4",
                  punc_model="ct-punc-c", punc_model_revision="v2.0.4",
                  # spk_model="cam++", spk_model_revision="v2.0.2",
                  )
res = model.generate(input=f"{model.model_path}/example/asr_example.wav", 
            batch_size_s=300, 
            hotword='魔搭')
print(res)

那么它只能输出一段带上标点的识别文字,我仍然无法知道每一句话对应的位置。

我该如何解决这个问题呢,如蒙赐教,万分感谢!

PS:我找不到关于funasr的文档,不知道这个funasr到底拥有哪些函数,只找到了一些例子,这是否意味着funasr就只有例子中的那些方法呢?
此外,whisper我也没有找到相关的文档

@liuchangzong liuchangzong added the question Further information is requested label Nov 8, 2024
@dthcle
Copy link

dthcle commented Nov 12, 2024

有带时间轴功能的模型 你跑一下时间轴然后手动处理一下?
模型名
speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants