You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is not clear where the problem really is, maybe you could fix the formatting...
If you mean the pipeline segments are wrong/misplaced, it might be due to lots of factors that makes it very hard for the pretrained pipeline to perform well out-of-the-box : noisy audio, specific acoustic conditions that were not seen when the model was trained, etc
You might want to finetune the model on the type of data you target (and take a look at the available tutorial notebooks).
Tested versions
pyannote.audio = 3.3.1
System information
ubuntu
Issue description
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-3.1",
use_auth_token="hf_KkqHxRTGcaXXXXXXXsZvlMCDgAmBuSGCmXE")
import torch
pipeline.to(torch.device("cuda"))
diarization = pipeline("/root/Audio/Test.mp3")
for turn, , speaker in diarization.itertracks(yield_label=True):
print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker{speaker}")
start=0.6s stop=2.2s speaker_SPEAKER_00
start=3.5s stop=4.0s speaker_SPEAKER_00
start=0.6s stop=2.2s -> 00:00:00,600 --> 00:00:02,200
start=3.5s stop=4.0s -> 00:00:03,500 --> 00:00:04,000
The timeline is wrong
The right time is:
00:00:02,600 --> 00:00:04,486
00:00:05,439 --> 00:00:06,013
please help me!!!
The text was updated successfully, but these errors were encountered: