`AudioSegment.from_file` is not restricted to WAV files #14

NicolasMICAUX · 2022-12-04T15:50:19Z

Feature Description

In preprocess.py, you use AudioSegment.from_file to load the audio files. According to definition of this function, it seems it accepts a bunch of file formats.
but at the previous line, you restrict the user to wav files for ... in label_dir.glob("*.wav")
Is there a reason to that? Mp3/ogg/etc. files would be useful, as they take less place on the disk.

Use Case

Basic preprocess then train, as in README.

Solution

replace .glob('*wav') by a list of formats pydub accepts. I didn't find a list yet, but the implementation of from_file gives us some hints of the formats accepted.

The text was updated successfully, but these errors were encountered:

evamaxfield · 2022-12-05T17:57:52Z

My only reasoning for that was simply that I didn't know the list of accepted formats and I primarily work with wav. If you want to make a PR, I would happily accept one!

NicolasMICAUX · 2022-12-05T22:05:23Z

Ok! I will probably do a PR for this when I have some time.

kristopher-smith · 2024-08-17T19:52:26Z

Pydub is a library for loading and manipulating various formats of audio files. Therefore it does support various file formats.

However here it is only being used as a pre-processing step. The diarization models within Pyannote were trained on .wav files and therefore expect that format.

Also it is good practice to not use "lossy" formats when storing audio/video files if you plan to use them as inputs to ML/Deep Learning models downstream. These lossy formats such as .mp3 and .ogg abstract information away and the models work best with as much information preserved as possible.

This also means if you store them in lossy formats to save on storage costs and then convert them to .wav for inference that information is never restored and remains lost when converting back to non-lossy formats(such as .wav). This will result in lower quality outputs from the models.

Sorry for the late input...

I think it is safe to close this one?

NicolasMICAUX added the enhancement New feature or request label Dec 4, 2022

NicolasMICAUX linked a pull request Dec 10, 2022 that will close this issue

feature/any-audio-file-type #15

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`AudioSegment.from_file` is not restricted to WAV files #14

`AudioSegment.from_file` is not restricted to WAV files #14

NicolasMICAUX commented Dec 4, 2022

evamaxfield commented Dec 5, 2022

NicolasMICAUX commented Dec 5, 2022

kristopher-smith commented Aug 17, 2024

AudioSegment.from_file is not restricted to WAV files #14

AudioSegment.from_file is not restricted to WAV files #14

Comments

NicolasMICAUX commented Dec 4, 2022

Feature Description

Use Case

Solution

evamaxfield commented Dec 5, 2022

NicolasMICAUX commented Dec 5, 2022

kristopher-smith commented Aug 17, 2024

`AudioSegment.from_file` is not restricted to WAV files #14

`AudioSegment.from_file` is not restricted to WAV files #14