Replies: 1 comment
-
I don't often get feedback from actual users so I would love to know more about your actual use case (either here or via email if you want to keep it private).
You'd have to tell me a bit more about your actual audio. The length of your audio might be one reason as hyper-parameters of the pretrained pipeline (
Removing overlapped speech before clustering would definitely help as you'd only compare pure (as in "just one speaker") speech segments. Also, you are right that pyannote 2.0 will eventually contain an overlap-aware speaker diarization pipeline that should improve things a bit but I cannot provide any ETA. Feel free to sponsor the project to make things faster ;-) Finally, you might be interested in that project that will eventually be merged into pyannote 2.0 |
Beta Was this translation helpful? Give feedback.
-
Hi, thank you for creating an amazing repo. I have used this model and gotten lots of good feedback from users. I have two questions below.
We tend to overestimate the number of speakers if the length of audio is long (ex. 2 hours). Do you think this is because the longer an audio file is, the more we get overlapped segments? I thought the increase in the overlapped segments could form a new cluster and are considered as a new speaker by the clustering process. I wanted to have your view on this.
If the hypothesis above is true, we can presume that we should remove the overlapped segments using the pyannote overlap detection in the pipeline. Thus, we are trying to modify the pipeline of pyannote 1.1, and add the overlap detection in it. But I believe this is kind of what you intend to do with pyannote 2.0. I checked the pyannote 2.0 pipeline but it seems like the overlap-detection/resegmentation are not included yet. Would you be able to share when it will be added to the pyannote 2.0?
Thank you in advance!
Beta Was this translation helpful? Give feedback.
All reactions