padac-mmasia24

Source code for "Pitch-aware generative pretraining improves multi-pitch estimation with scarce data" (MMASIA 2024)

This repo is based on the DAC repo, accompanying the paper "High-Fidelity Audio Compression with Improved RVQGAN".
Please create a virtual environment and install the packages specified in requirements.txt

Replace dataset paths in conf/padac/pitch_cond_padac.yml with the paths of the dataset you would like to perform pretraining on.
Run the following command to start training. Replace ./runs with the path to the folder where you would like the model checkpoints to be saved.

python -m scripts.train_padac --args.load conf/padac/conf_padac.yml --save_path ./runs

After freezing PA-DAC, extract and save latent space embeddings using latent_space = self.encoder(audio_data). Refer to the script scripts/extract_features.py for an example.
Prepare a config file similar to conf/transcriber.json specifying the paths to the extracted features and ground truth.
To start training, run the following command:

python -m scripts.train_transcriber --config_file ./conf/transcriber.json

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.idea		.idea
conf		conf
dac		dac
dataset_ids		dataset_ids
plots		plots
scripts		scripts
transcriber		transcriber
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback