Noise contributions contain clear voices #251

codesoap · 2025-01-11T09:30:08Z

I have taken a closer look at the noise contributions at media.xiph.org/rnnoise/rnnoise_contributions.tar.gz. With the help of sox I have skimmed through some of the loudest files and found many instances where clear voice is present. I think they are hurting the training of the AI model, since the model will be trained to recognize voices as noise with those files. Those are the files that I found especially problematic:

1507065430027-other.raw
1506685521238-office.raw
1507820740116-office.raw
1507821383566-office.raw
1507821453362-office.raw
1506585900089-office.raw
1506613198283-street.raw
1506696624320-other.raw
1506890775110-office.raw
1506895815096-other.raw
1506913173112-other.raw
1506943243649-office.raw
1506950688321-street.raw
1506960333480-other.raw
1506961583802-other.raw
1506969723079-other.raw
1507044042452-other.raw
1507063974483-office.raw
1507119001551-office.raw
1507203104584-office.raw
1507248253471-other.raw
1507278137650-train.raw
1507300541584-street.raw
1507350198843-train.raw
1507757947291-office.raw
1507762324044-office.raw
1507764317054-office.raw
1507764388966-office.raw
1509685510922-none.raw
1509697005790-office.raw
1509724471357-office.raw
1526243930052-none.raw
1526906303516-other.raw
1530283938518-other.raw

I think those files should be removed from the dataset.

There are many more files, which contain muffled voices, but I suppose they are not as problematic.

The text was updated successfully, but these errors were encountered:

jmvalin · 2025-01-22T02:53:53Z

Yes, indeed those should be excluded from the dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Noise contributions contain clear voices #251

Noise contributions contain clear voices #251

codesoap commented Jan 11, 2025 •

edited

Loading

jmvalin commented Jan 22, 2025

Noise contributions contain clear voices #251

Noise contributions contain clear voices #251

Comments

codesoap commented Jan 11, 2025 • edited Loading

jmvalin commented Jan 22, 2025

codesoap commented Jan 11, 2025 •

edited

Loading