Unsupervised Disentanglement of Pose, Appearance and Background from Images and Videos

Aysegul Dundar, Kevin J. Shih, Animesh Garg, Robert Pottorf, Andrew Tao, Bryan Catanzaro

This repository provides our implementation for learning unsupervised landmarks from video data, using a foreground-background factorized image reconstruction. Details provided in the following tech report: https://arxiv.org/abs/2001.09518

Setup

To download:

git clone https://github.com/NVIDIA/UnsupervisedLandmarkLearning.git --recursive

Tested with pytorch 1.1. May work with newer versions. The easiest way to get things running is to build the docker image located in: docker/dockerfile

cd docker/
docker build -t nvidia-unsupervised-landmarks -f dockerfile .

Then launch an interactive session with nvidia-docker with all the necessary paths to datasets and source code:

nvidia-docker run -v folder/to/mount:/path/in/container:rw -it nvidia-unsupervised-landmarks bash

Datasets

BBC

Download the train, val, and test images from bbcpose_data_1.0.zip from here Unzipping should yield 20 tarballs, which should all be individually extracted into the same directory. You should end up with one directory with 20 subdirectories numbered 1 through 20. Also download and extract bbcpose_extbbcpose_code_1.0.tgz into the same directory, which should create an additional code subdirectory. This will include the annotations necessary to evaluate on the test set.

Landmark Training

This section covers how to train the model for learning the self-supervised landmark representation. This does not include the temporal dynamics model.

Single GPU Training

The training command will look something like

python train.py --config configs/my_configs.yaml

where my_configs.yaml is replaced with the specific configuration file for the target dataset.

All training and model args are specified in the yaml file. Pleas refer to configs/defaults.yaml for descriptions of the configuration options. Most options should be left alone. The ones that you'll want to set are:

dataset_path: path where the dataset is located
n_landmarks: number of unsupervised landmarks to learn
use_fg_bg_mask: enables unsupervised foreground-background mask prediction
use_identity_covariance: when enabled, we assume the Gaussians for each landmark are isotropic with a fixed covariance. Otherwise, we fit the covariance to the underlying activation map. Identity covariance leads to higher landmark accuracy, but slightly lower image generation quality.
fixed_covar: The diagonal value of the covariance matrix when use_identity_covariance is set
save_path: path to where training output should be saved to. The tensorboard logfiles will be located there as well
use_gan: Enables the adversarial loss. This enhances realism but slightly degrades the pixelwise precision of the landmarks.
use_temporal: Enables temporal sampling (sampling for the same subject in a future frame) for obtaining pose variation during training
use_warped: Enables thin-plate-spline warping. This tends to be unecessary when enough pose variation is provided by the temporal sampling. For static image datasets, this is the only option for sampling variations in pose during training.

The configs folder contains default configurations to use for each dataset, though at minimum, you will still need to set dataset_path and save_path. You can either directly edit the configuration file, or specify it in command line with:

python train.py --config configs/my_configs.yaml --dataset_path path/to/dataset --save_path path/to/output/directory

In general, any option can be directly specified via commandline arguments. Commandline arguments take highest prededence, followed by the configuration file passed in via the --config argument. Any remaining unspecified arugments will use the default setting in configs/defaults.yaml (read automatically by train.py and does not need to be specified).

Multi-GPU Training

To enable distributed data-parallel, simply enable the use_DDP by setting it to true in the configuration files or by appending --use_DDP to the command line arguments. The final commandline argument will look something like:

python -m torch.distributed.launch --nproc_per_node=$NGPUS python train.py --config configs/my_configs.yaml --dataset_path path/to/dataset --save_path path/to/output/directory --use_DDP

where $NGPUS is replaced with the number of GPUS you wish to use.

BBC Landmark Evaluation

Here, we show how to evaluate a pre-trained landmark model on the BBC Pose dataset by regressing from the unsupervised landmarks to the annotated keypoints.

CKPT_DIR=/output/directory/for/pretrained/model
EP=5 % # epoch-indexed checkpoint file that we wish to load from. Using epoch 5, for example
TEST_ANNO_PATH=/path/to/bbcpose_data_1.0/code/results.mat
VAL_ANNO_PATH=/path/to/bbc_val_GT.mat # precomputed bbc validation set annotations, stored in the same format as the results.mat
# cache the gaussian landmarks for the training data
# landmarks will be stored in pkl files in the gauss_dumps directory
# opt.yaml is the configuration file holding all the options for the pretrained model. It will be automatically output during training
# WARNING: this might take 20-30 minutes, and requires a fair amount of RAM for the training set caching. 
python dump_preds.py --config $CKPT_DIR/opt.yaml --gaussians_save_path $CKPT_DIR/gauss_dumps --resume_ckpt $CKPT_DIR/checkpoint_${EP}.ckpt


# map the landmarks to the annotated keypoints with linear regression (no intercept)
python scripts/map_to_supervised_landmarks_bbc.py --gaussian_path $CKPT_DIR/gauss_dumps --out_path $CKPT_DIR/gauss_dumps

# load the test-set annotations and evaluate. This script should match the output provided by the matlab eval code in the BBC Pose code folder
python scripts/eval_supervised_landmarks_bbc.py --results_path $CKPT_DIR/gauss_dumps --test_anno_path $TEST_ANNO_PATH --val_anno_path $VAL_ANNO_PATH

Acknowledgements

Our training pipeline closely follows the training pipeline as described in Lorenz et al.. Our generator code is heavily based off of SPADE and pix2pixHD, which in turn borrow heavily from pytorch-CycleGAN-and-pix2pix. We also closely follow the evaluation script for BBC from Unsupervised Learning of Object Landmarks through Conditional Image Generation.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
apex @ 1603407		apex @ 1603407
configs		configs
dataloaders		dataloaders
docker		docker
models		models
scripts		scripts
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
demo.png		demo.png
dump_preds.py		dump_preds.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Disentanglement of Pose, Appearance and Background from Images and Videos

Setup

Datasets

BBC

Landmark Training

Single GPU Training

Multi-GPU Training

BBC Landmark Evaluation

Acknowledgements

About

Releases

Packages

Languages

License

TerrisGO/UnsupervisedLandmarkLearning

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Disentanglement of Pose, Appearance and Background from Images and Videos

Setup

Datasets

BBC

Landmark Training

Single GPU Training

Multi-GPU Training

BBC Landmark Evaluation

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages