Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AudioSegment.from_file is not restricted to WAV files #14

Open
NicolasMICAUX opened this issue Dec 4, 2022 · 3 comments · May be fixed by #15
Open

AudioSegment.from_file is not restricted to WAV files #14

NicolasMICAUX opened this issue Dec 4, 2022 · 3 comments · May be fixed by #15
Labels
enhancement New feature or request

Comments

@NicolasMICAUX
Copy link

Feature Description

In preprocess.py, you use AudioSegment.from_file to load the audio files. According to definition of this function, it seems it accepts a bunch of file formats.
but at the previous line, you restrict the user to wav files for ... in label_dir.glob("*.wav")
Is there a reason to that? Mp3/ogg/etc. files would be useful, as they take less place on the disk.

Use Case

Basic preprocess then train, as in README.

Solution

replace .glob('*wav') by a list of formats pydub accepts. I didn't find a list yet, but the implementation of from_file gives us some hints of the formats accepted.

@NicolasMICAUX NicolasMICAUX added the enhancement New feature or request label Dec 4, 2022
@evamaxfield
Copy link
Member

My only reasoning for that was simply that I didn't know the list of accepted formats and I primarily work with wav. If you want to make a PR, I would happily accept one!

@NicolasMICAUX
Copy link
Author

Ok! I will probably do a PR for this when I have some time.

@NicolasMICAUX NicolasMICAUX linked a pull request Dec 10, 2022 that will close this issue
@kristopher-smith
Copy link

Pydub is a library for loading and manipulating various formats of audio files. Therefore it does support various file formats.

However here it is only being used as a pre-processing step. The diarization models within Pyannote were trained on .wav files and therefore expect that format.

Also it is good practice to not use "lossy" formats when storing audio/video files if you plan to use them as inputs to ML/Deep Learning models downstream. These lossy formats such as .mp3 and .ogg abstract information away and the models work best with as much information preserved as possible.

This also means if you store them in lossy formats to save on storage costs and then convert them to .wav for inference that information is never restored and remains lost when converting back to non-lossy formats(such as .wav). This will result in lower quality outputs from the models.

Sorry for the late input...

I think it is safe to close this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants