About usage #17

MonolithFoundation · 2025-01-14T06:33:19Z

Hello, I'd like use this model performance a scenario of already segmented audio part by VAD, but these segments might have connected more than 1 speaker. Is. this model able to do that?

For detail, like:

---
   ---

they are different speaker, but VAD can't not separate them. I need their independent voices.

If it does, any simple code snippet can be referenced to do it?

The text was updated successfully, but these errors were encountered:

DiLiangWU · 2025-01-19T13:59:50Z

You can obtain the prediction diarization results by following these steps

Prepare the wav.scp file, like:

mix001 /path/to/audio/file1.wav
mix002 /path/to/audio/file2.wav
mix003 /path/to/audio/file3.wav

Comment out self.segments, self.utt2spk, self.reco2dur, and self.spk2utt in kaldi_data.py.
Modify datasets.diarization_dataset to datasets.diarization_dataset_predict
Modify trainer.test(spk_dia_main) to trainer.predict(spk_dia_main)
python train_dia.py --configs conf/xxx_infer.yaml --gpus YOUR_DEVICE_ID, --test_from_folder YOUR_CKPT_SAVE_DIR
Generate speech activity probability

cd visualize
python gen_h5_output.py

Then you can obtain the decision results through a threshold, like in the image below, for single-speaker speech extraction.

MonolithFoundation · 2025-01-21T03:57:04Z

Hi, what if spk1 and spk2 have overlap?

I just want a code that can send a voice in, output timestamp result.

DiLiangWU · 2025-01-21T07:57:57Z

Hi, what if spk1 and spk2 have overlap?

Sorry, I misunderstood that your input is non-overlapping multi-speaker speech. FS-EEND can naturally handle overlapping speech. An example of the output is shown in the figure below.

The code for receiving a WAV file and outputting a Rich Transcription Time Marked (RTTM) file has been updated. You can infer by (# Modify val_data_dir in conf/xxx_infer.yaml according to your own WAV directory.)
python train_dia_pred.py --configs conf/xxx_infer.yaml --gpus YOUR_DEVICE_ID, --test_from_folder YOUR_CKPT_SAVE_DIR

MonolithFoundation · 2025-01-21T08:02:20Z

Am wondering if there any as simple as possible function to do this for example: dia_pred(audio_path), then it returns the timestamps dict.

I looked the train_dia_pred code, way to complicated and coupled with training all kinds of code.

Would consider make a simple inference only code for users easy to use out of box?

DiLiangWU · 2025-01-21T08:09:56Z

Sure, I understand your point. Thank you for your suggestion. We will simplify the inference code and update it in the repo.

MonolithFoundation · 2025-01-21T08:27:39Z

Thank u so much for the consideration! Hoping for a strong base diari model with overlap that can use at ease

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About usage #17

About usage #17

MonolithFoundation commented Jan 14, 2025

DiLiangWU commented Jan 19, 2025

MonolithFoundation commented Jan 21, 2025

DiLiangWU commented Jan 21, 2025 •

edited

Loading

MonolithFoundation commented Jan 21, 2025

DiLiangWU commented Jan 21, 2025

MonolithFoundation commented Jan 21, 2025 •

edited

Loading

About usage #17

About usage #17

Comments

MonolithFoundation commented Jan 14, 2025

DiLiangWU commented Jan 19, 2025

MonolithFoundation commented Jan 21, 2025

DiLiangWU commented Jan 21, 2025 • edited Loading

MonolithFoundation commented Jan 21, 2025

DiLiangWU commented Jan 21, 2025

MonolithFoundation commented Jan 21, 2025 • edited Loading

DiLiangWU commented Jan 21, 2025 •

edited

Loading

MonolithFoundation commented Jan 21, 2025 •

edited

Loading