Source code for "Pitch-aware generative pretraining improves multi-pitch estimation with scarce data" (MMASIA 2024)
- This repo is based on the DAC repo, accompanying the paper "High-Fidelity Audio Compression with Improved RVQGAN".
- Please create a virtual environment and install the packages specified in requirements.txt
- Replace dataset paths in conf/padac/pitch_cond_padac.yml with the paths of the dataset you would like to perform pretraining on.
- Run the following command to start training. Replace ./runs with the path to the folder where you would like the model checkpoints to be saved.
python -m scripts.train_padac --args.load conf/padac/conf_padac.yml --save_path ./runs
- After freezing PA-DAC, extract and save latent space embeddings using
latent_space = self.encoder(audio_data)
. Refer to the script scripts/extract_features.py for an example. - Prepare a config file similar to conf/transcriber.json specifying the paths to the extracted features and ground truth.
- To start training, run the following command:
python -m scripts.train_transcriber --config_file ./conf/transcriber.json