You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In preprocess.py, you use AudioSegment.from_file to load the audio files. According to definition of this function, it seems it accepts a bunch of file formats.
but at the previous line, you restrict the user to wav files for ... in label_dir.glob("*.wav")
Is there a reason to that? Mp3/ogg/etc. files would be useful, as they take less place on the disk.
Use Case
Basic preprocess then train, as in README.
Solution
replace .glob('*wav') by a list of formats pydub accepts. I didn't find a list yet, but the implementation of from_file gives us some hints of the formats accepted.
The text was updated successfully, but these errors were encountered:
My only reasoning for that was simply that I didn't know the list of accepted formats and I primarily work with wav. If you want to make a PR, I would happily accept one!
Pydub is a library for loading and manipulating various formats of audio files. Therefore it does support various file formats.
However here it is only being used as a pre-processing step. The diarization models within Pyannote were trained on .wav files and therefore expect that format.
Also it is good practice to not use "lossy" formats when storing audio/video files if you plan to use them as inputs to ML/Deep Learning models downstream. These lossy formats such as .mp3 and .ogg abstract information away and the models work best with as much information preserved as possible.
This also means if you store them in lossy formats to save on storage costs and then convert them to .wav for inference that information is never restored and remains lost when converting back to non-lossy formats(such as .wav). This will result in lower quality outputs from the models.
Feature Description
In
preprocess.py
, you useAudioSegment.from_file
to load the audio files. According to definition of this function, it seems it accepts a bunch of file formats.but at the previous line, you restrict the user to wav files
for ... in label_dir.glob("*.wav")
Is there a reason to that? Mp3/ogg/etc. files would be useful, as they take less place on the disk.
Use Case
Basic
preprocess
thentrain
, as in README.Solution
replace
.glob('*wav')
by a list of formats pydub accepts. I didn't find a list yet, but the implementation offrom_file
gives us some hints of the formats accepted.The text was updated successfully, but these errors were encountered: