Figure: Training framework of GH-Feat.
Generative Hierarchical Features from Synthesizing Images
Yinghao Xu*, Yujun Shen*, Jiapeng Zhu, Ceyuan Yang, Bolei Zhou
Computer Vision and Pattern Recognition (CVPR), 2021 (Oral)
[Paper] [Project Page]
In this work, we show that well-trained GAN generators can be used as training supervision to learn hierarchical visual features. We call this feature as Generative Hierarchical Feature (GH-Feat). Properly learned from a novel hierarchical encoder, GH-Feat is able to facilitate both discriminative and generative visual tasks, including face verification, landmark detection, layout prediction, transfer learning, style mixing, image editing, etc.
Before running the code, please setup the environment with
conda env create -f environment.yml
conda activate ghfeat
The following script can be used to extract GH-Feat from a list of images.
python extract_ghfeat.py ${ENCODER_PATH} ${IMAGE_LIST} -o ${OUTPUT_DIR}
We provide some well-learned encoders for inference.
Path | Description |
---|---|
face_256x256 | GH-Feat encoder trained on FF-HQ dataset. |
tower_256x256 | GH-Feat encoder trained on LSUN Tower dataset. |
bedroom_256x256 | GH-Feat encoder trained on LSUN Bedroom dataset. |
Given a well-trained StyleGAN generator, our hierarchical encoder is trained with the objective of image reconstruction.
python train_ghfeat.py \
${TRAIN_DATA_PATH} \
${VAL_DATA_PATH} \
${GENERATOR_PATH} \
--num_gpus ${NUM_GPUS}
Here, the train_data
and val_data
can be created by this script. Note that, according to the official StyleGAN repo, the dataset is prepared in the multi-scale manner, but our encoder training only requires the data at the largest resolution. Hence, please specify the path to the tfrecords
with the target resolution instead of the directory of all the tfrecords
files.
Users can also train the encoder with slurm:
srun.sh ${PARTITION} ${NUM_GPUS} \
python train_ghfeat.py \
${TRAIN_DATA_PATH} \
${VAL_DATA_PATH} \
${GENERATOR_PATH} \
--num_gpus ${NUM_GPUS}
We provide some pre-trained generators as follows.
Path | Description |
---|---|
face_256x256 | StyleGAN trained on FFHQ dataset. |
tower_256x256 | StyleGAN trained on LSUN Tower dataset. |
bedroom_256x256 | StyleGAN trained on LSUN Bedroom dataset. |
- Most codes are directly borrowed from StyleGAN repo.
- Structure of the proposed hierarchical encoder:
training/networks_ghfeat.py
- Training loop of the encoder:
training/training_loop_ghfeat.py
- To feed GH-Feat produced by the encoder to the generator as layer-wise style codes, we slightly modify
training/networks_stylegan.py
. (See Line 263 and Line 477). - Main script for encoder training:
train_ghfeat.py
. - Script for extracting GH-Feat from images:
extract_ghfeat.py
. - VGG model for computing perceptual loss:
perceptual_model.py
.
We show some results achieved by GH-Feat on a variety of downstream visual tasks.
Indoor scene layout prediction
Face verification (face reconstruction)
@inproceedings{xu2021generative,
title = {Generative Hierarchical Features from Synthesizing Images},
author = {Xu, Yinghao and Shen, Yujun and Zhu, Jiapeng and Yang, Ceyuan and Zhou, Bolei},
booktitle = {CVPR},
year = {2021}
}