Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Noise contributions contain very loud noise #252

Open
codesoap opened this issue Jan 11, 2025 · 0 comments
Open

Noise contributions contain very loud noise #252

codesoap opened this issue Jan 11, 2025 · 0 comments

Comments

@codesoap
Copy link
Contributor

codesoap commented Jan 11, 2025

I have taken a closer look at the noise contributions at media.xiph.org/rnnoise/rnnoise_contributions.tar.gz. With the help of sox I have skimmed through some of the loudest files and found many instances where the noise is so loud, that I find it unreasonable to expect an AI model to recognize voice next to it. Those are the most problematic files I've found:

  • 1506612372095-other.raw
  • 1506865846246-other.raw
  • 1506890776920-other.raw
  • 1506896387552-other.raw
  • 1506904933605-coffee.raw
  • 1506905761767-coffee.raw
  • 1506931866078-other.raw
  • 1506937851368-office.raw
  • 1506942115691-office.raw
  • 1507008551397-other.raw
  • 1507024121772-other.raw
  • 1507046472430-other.raw
  • 1507051246600-street.raw
  • 1507053038795-other.raw
  • 1507225021633-other.raw
  • 1507225705223-other.raw
  • 1507256882651-other.raw
  • 1507264564781-other.raw
  • 1507279040493-train.raw
  • 1507279110456-train.raw
  • 1507288337806-other.raw
  • 1506716634275-other.raw
  • 1507372594108-office.raw
  • 1508468651573-office.raw
  • 1508504834575-car.raw
  • 1508917528488-office.raw
  • 1509685708555-none.raw
  • 1509701170578-train.raw
  • 1511050964203-none.raw

I think removing those files from the dataset will improve the quality of the AI model.

There are many more files containing loud noise, but I've tried not to include files where a human could at least make out some voice next to the noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant