Project Page | arXiv | PDF
NeRF-MAE : Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
Muhammad Zubair Irshad
·
Sergey Zakharov
·
Vitor Guizilini
·
Adrien Gaidon
·
Zsolt Kira
·
Rares Ambrus
European Conference on Computer Vision, ECCV 2024
Toyota Research Institute | Georgia Institute of Technology
- NeRF-MAE: The first large-scale pretraining utilizing Neural Radiance Fields (NeRF) as an input modality. We pretrain a single Transformer model on thousands of NeRFs for 3D representation learning.
- NeRF-MAE Dataset: A large-scale NeRF pretraining and downstream task finetuning dataset.
- Release large-scale pretraining code 🚀
- Release NeRF-MAE dataset comprising radiance and density grids 🚀
- Release 3D object detection finetuning and eval code 🚀
- Pretrained NeRF-MAE checkpoints and out-of-the-box model usage
- Release multi-view rendered images and Instant-NGP checkpoints (totalling 1.6M+ posed images and 3200+ trained NeRF checkpoints)
If you find this repository or our dataset useful, please star ⭐ this repository and consider citing 📝:
@inproceedings{irshad2024nerfmae,
title={NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields},
author={Muhammad Zubair Irshad and Sergey Zakharov and Vitor Guizilini and Adrien Gaidon and Zsolt Kira and Rares Ambrus},
booktitle={European Conference on Computer Vision (ECCV)},
year={2024}
}
Create a python 3.7 virtual environment and install requirements:
cd $NeRF-MAE repo
conda create -n nerf-mae python=3.9
conda activate nerf-mae
pip install --upgrade pip
pip install -r requirements.txt
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html
The code was built and tested on cuda 11.3
Compile CUDA extension, to run downstream task finetuning, as described in NeRF-RPN:
cd $NeRF-MAE repo
cd nerf_rpn/model/rotated_iou/cuda_op
python setup.py install
cd ../../../..
Download the preprocessed datasets here.
- Pretraining dataset (comprising NeRF radiance and density grids). Download link
- Finetuning dataset (comprising NeRF radiance and density grids and bounding box/semantic labelling annotations). 3D Object Detection (Provided by NeRF-RPN), 3D Semantic Segmentation (Coming Soon), Voxel-Super Resolution (Coming Soon)
Extract pretraining and finetuning dataset under NeRF-MAE/datasets
. The directory structure should look like this:
NeRF-MAE
├── pretrain
│ ├── features
│ └── nerfmae_split.npz
└── finetune
└── front3d_rpn_data
├── features
├── aabb
└── obb
Note: The above datasets are all you need to train and evaluate our method. Bonus: we will be releasing our multi-view rendered posed RGB images from FRONT3D, HM3D and Hypersim as well as Instant-NGP trained checkpoints soon (these comprise over 1M+ images and 3k+ NeRF checkpoints)
Please note that our dataset was generated using the instruction from NeRF-RPN and 3D-CLR. Please consider citing our work, NeRF-RPN and 3D-CLR if you find this dataset useful in your research.
Please also note that our dataset uses Front3D, Habitat-Matterport3D, HyperSim and ScanNet as the base version of the dataset i.e. we train a NeRF per scene and extract radiance and desnity grid as well as aligned NeRF-grid 3D annotations. Please read the term of use for each dataset if you want to utilize the posed multi-view images for each of these datasets.
NeRF-MAE (package: nerf-mae) is structured to provide easy access to pretrained NeRF-MAE models (and reproductions), to facilitate use for various downstream tasks. This is for extracting good visual features from NeRFs if you don't have resources for large-scale pretraining. Our pretraining provides an easy-to-access embedding of any NeRF scene, which can be used for a variety of downstream tasks in a straightforwaed way. Package, usage and our pretrained checkpoints are coming soon.
Ofcourse, you can also pretrain your own NeRF-MAE models. Navigate to nerf-mae folder and run pretraining script.
cd nerf-mae
bash train_mae3d.sh
Checkout train_mae3d.sh file for a complete list of all hyperparameters such as num_epochs
, lr
, masking_prob
etc.
Checkpoints will be saved at a regular interval of 200 epochs. For reproducing the paper results, we utilize the checkpoints at 1200 epochs.
Notes:
-
with default settings i.e.
batch_size 32
and gpus0,1,2,3,4,5,6,7
onA100
GPU, the expected time it takes to pretrain is around 2 days. Please set these accoringly based on your machine's capacity. -
The dataset_name is set to default as
dataset_name="nerfmae"
. This is for convenince for the dataloader as it describes the format. Our pretraining data comprises of scenes from Front3D, Habitat Matterport3D and Hypersim.
Our finetuning code is largely based on NeRF-RPN. Infact, we use the same dataset as NeRF-RPN (unseen during pretraining), for finetuning. This makes sure our comparison with NeRF-RPN is based on the same architecture, the only difference is the network weights are started from scratch for NeRF-RPN, whereas in our case, we start with our pretrained network weights. Please see our paper for more details.
Note: We do not see ScanNet dataset during our pretraining. ScanNet 3D OBB prediction finetuning is a challenging case of cross-dataset transfer.
Navigate to nerf-rpn folder and run finetuning script.
To run 3D Swin Transformer + FPN model finetuning with our pretrained weights:
cd nerf-rpn
bash train_fcos_pretrained.sh
To train the 3D Swin Transformer + FPN model model with weights started from scratch:
cd nerf-rpn
bash train_fcos.sh
Note: only 3D Swin Transformer weights are started from our pretraining. FPN weights for both cases are started from scratch. For evaluating our pretrained weights or finetuning from scratch, use bash test_fcos_pretrained.sh
or bash test_fcos.sh
Checkout train_fcos_pretraining.sh and test_fcos_pretrained.sh file for a complete list of all hyperparameters such as mae_checkpoint
, num_epochs
, lr
, masking_prob
etc. Code for finetuning and eval for our downstream tasks are based on NeRF-RPN's implementation.
This code is built upon the implementation from NeRF-RPN. We appreciate the authors for releasing their open-source implementation.
This repository and dataset is released under the CC BY-NC 4.0 license.