UniTalker generates realistic facial motion from different audio domains, including clean and noisy voices in various languages, text-to-speech-generated audios, and even noisy songs accompanied by back-ground music.
UniTalker can output multiple annotations.
For datasets with new annotations, one can simply plug new heads into UniTalker and train it with existing datasets or solely with new ones, avoiding retopology.
- Linux
- Python 3.10
- Pytorch 2.2.0
- CUDA 12.1
- transformers 4.39.3
- Pytorch3d 0.7.7 (Optional: just for rendering the results)
conda create -n unitalker python==3.10
conda activate unitalker
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install transformers librosa tensorboardX smplx chumpy numpy==1.23.5 opencv-python
UniTalker-B-[D0-D7]: The base model in paper. Download it and place it in ./pretrained_models
.
UniTalker-L-[D0-D7]: The default model in paper. Please first try the base model to run the pipeline through.
Unitalker-data-release-V1: The released datasets, PCA models, data-split json files and id-template numpy array. Download and unzip it in this repo.
FLAME2020: Please download FLAME 2020 and move generic_model.pkl into resources/binary_resources/flame.pkl.
Use git lfs pull
to get ./resources.zip
and ./test_audios.zip
and unzip it in this repo.
Finally, these files should be organized as follows:
├── pretrained_models
│ ├── UniTalker-B-D0-D7.pt
│ ├── UniTalker-L-D0-D7.pt
├── resources
│ ├── binary_resources
│ │ ├── 02_flame_mouth_idx.npy
│ │ ├── ...
│ │ └── vocaset_FDD_wo_eyes.npy
│ └── obj_template
│ ├── 3DETF_blendshape_weight.obj
│ ├── ...
│ └── meshtalk_6172_vertices.obj
└── unitalker_data_release_V1
│ ├── D0_BIWI
│ │ ├── id_template.npy
│ │ └── pca.npz
│ ├── D1_vocaset
│ │ ├── id_template.npy
│ │ └── pca.npz
│ ├── D2_meshtalk
│ │ ├── id_template.npy
│ │ └── pca.npz
│ ├── D3D4_3DETF
│ │ ├── D3_HDTF
│ │ └── D4_RAVDESS
│ ├── D5_unitalker_faceforensics++
│ │ ├── id_template.npy
│ │ ├── test
│ │ ├── test.json
│ │ ├── train
│ │ ├── train.json
│ │ ├── val
│ │ └── val.json
│ ├── D6_unitalker_Chinese_speech
│ │ ├── id_template.npy
│ │ ├── test
│ │ ├── test.json
│ │ ├── train
│ │ ├── train.json
│ │ ├── val
│ │ └── val.json
│ └── D7_unitalker_song
│ ├── id_template.npy
│ ├── test
│ ├── test.json
│ ├── train
│ ├── train.json
│ ├── val
│ └── val.json
python -m main.demo --config config/unitalker.yaml test_out_path ./test_results/demo.npz
python -m main.render ./test_results/demo.npz ./test_audios ./test_results/
Unitalker-data-release-V1 contains D5, D6 and D7. The datasets have been processed and grouped into train, validation and test. Please use these three datasets to try the training step. If you want to train the model on the D0-D7, you need to download the datasets following these links: D0: BIWI. D1: VOCASET. D2: meshtalk. D4,D5: 3DETF.
Please modify dataset
and duplicate_list
in config/unitalker.yaml
according to the datasets you have prepared, ensuring that both lists maintain the same length.
python -m main.train --config config/unitalker.yaml