Skip to content

Latest commit

 

History

History
181 lines (146 loc) · 8.99 KB

README.md

File metadata and controls

181 lines (146 loc) · 8.99 KB

PonderV2: Pave the Way for 3D Foundation Model
with A Universal Pre-training Paradigm

Haoyi Zhu1,4*, Honghui Yang1,3*, Xiaoyang Wu1,2*, Di Huang1*, Sha Zhang1,4, Xianglong He1,
Hengshuang Zhao2, Chunhua Shen3, Yu Qiao1, Tong He1, Wanli Ouyang1

1Shanghai AI Lab, 2HKU, 3ZJU, 4USTC

PWC
PWC
PWC
PWC

radar

This is the official implementation of paper "PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm".

PonderV2 is a comprehensive 3D pre-training framework designed to facilitate the acquisition of efficient 3D representations, thereby establishing a pathway to 3D foundational models. It is a novel universal paradigm to learn point cloud representations by differentiable neural rendering, serving as a bridge between 3D and 2D worlds.

pipeline

Important Notes:

  • PonderV2 indoor pre-training configs have bugs before this commit, please make sure to use the fixed ones.
  • Structured3D RGB-D data preprocessing has bugs before this commit, please re-generate the processed data if you have used the code before that.

News:

  • Dec. 2023: Checkpoint weights are available in model zoo!
  • Dec. 2023: Multi-dataset training supported! More instructions on installation and usage are available. Please check out!
  • Nov. 2023: Model files are released! Usage instructions, complete codes and checkpoints are coming soon!
  • Oct. 2023: PonderV2 is released on arXiv.

Installation

Requirements

  • Ubuntu: 18.04 or higher
  • CUDA: 11.3 or higher
  • PyTorch: 1.10.0 or higher

Conda Environment

conda create -n ponderv2 python=3.8 -y
conda activate ponderv2
# Choose version you want here: https://pytorch.org/get-started/previous-versions/
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch -y
conda install h5py pyyaml -c anaconda -y
conda install sharedarray tensorboard tensorboardx addict einops scipy plyfile termcolor timm -c conda-forge -y
conda install pytorch-cluster pytorch-scatter pytorch-sparse -c pyg -y
pip install torch-geometric yapf==0.40.1 opencv-python open3d==0.10.0 imageio
pip install git+https://github.com/openai/CLIP.git

# spconv (SparseUNet)
# refer https://github.com/traveller59/spconv
pip install spconv-cu113

# precise eval
cd libs/pointops
# usual
python setup.py install
# docker & multi GPU arch
TORCH_CUDA_ARCH_LIST="ARCH LIST" python  setup.py install
# e.g. 7.5: RTX 3000; 8.0: a100 More available in: https://developer.nvidia.com/cuda-gpus
TORCH_CUDA_ARCH_LIST="7.5 8.0" python  setup.py install
cd ../..

# NeuS renderer
cd libs/smooth-sampler
# usual
python setup.py install
# docker & multi GPU arch
TORCH_CUDA_ARCH_LIST="ARCH LIST" python setup.py install
# e.g. 7.5: RTX 3000; 8.0: a100 More available in: https://developer.nvidia.com/cuda-gpus
TORCH_CUDA_ARCH_LIST="7.5 8.0" python setup.py install
cd ../..

If you want to run instance segmentation downstream tasks with PointGroup, you should also run the following:

conda install -c bioconda google-sparsehash 
cd libs/pointgroup_ops
python setup.py install --include_dirs=${CONDA_PREFIX}/include
cd ../..

Then uncomment # from .point_group import * in ponder/models/__init__.py.

Data Preparation

Please check out docs/data_preparation.md.

Model Zoo

Please check out docs/model_zoo.md.

Quick Start:

  • Pretraining: Pretrain PonderV2 on indoor or outdoor datasets.

Pre-train PonderV2 (indoor) on single ScanNet dataset with 8 GPUs:

# -g: number of GPUs
# -d: dataset
# -c: config file, the final config is ./config/${-d}/${-c}.py
# -n: experiment name
bash scripts/train.sh -g 8 -d scannet -c pretrain-ponder-spunet-v1m1-0-base -n ponderv2-pretrain-sc

Pre-train PonderV2 (indoor) on ScanNet, S3DIS and Structured3D datasets using Point Prompt Training (PPT) with 8 GPUs:

bash scripts/train.sh -g 8 -d scannet -c pretrain-ponder-ppt-v1m1-0-sc-s3-st-spunet -n ponderv2-pretrain-sc-s3-st

Pre-train PonderV2 (outdoor) on single nuScenes dataset with 4 GPUs:

bash scripts/train.sh -g 4 -d nuscenes -c pretrain-ponder-spunet-v1m1-0-base -n ponderv2-pretrain-nu
  • Finetuning: Finetune on downstream tasks with PonderV2 pre-trained checkpoints.

Finetune PonderV2 on ScanNet semantic segmentation downstream task with PPT:

# -w: path to checkpoint
bash scripts/train.sh -g 8 -d scannet -c semseg-ppt-v1m1-0-sc-s3-st-spunet-lovasz-ft -n ponderv2-semseg-ft -w ${PATH/TO/CHECKPOINT}

Finetune PonderV2 on ScanNet instance segmentation downstream task using PointGroup:

bash scripts/train.sh -g 4 -d scannet -c insseg-ppt-v1m1-0-pointgroup-spunet-ft -n insseg-pointgroup-v1m1-0-spunet-ft -w ${PATH/TO/CHECKPOINT}
  • Testing: Test a finetuned model on a downstream task.
# Based on experiment folder created by training script
bash scripts/test.sh -g 8 -d scannet -n ponderv2-semseg-ft -w ${CHECKPOINT/NAME}

You can download our trained checkpoint weights in docs/model_zoo.md.

For more detailed options and examples, please refer to docs/getting_started.md.

For more outdoor pre-training and downstream information, you can also refer to UniPAD.

Todo:

  • add instructions on installation and usage
  • add ScanNet w. RGB-D dataloader and data pre-processing scripts
  • add multi-dataset loader and trainer
  • add multi-dataset point prompt training model
  • add more pre-training and finetuning configs
  • add pre-trained checkpoints

Citation

@article{zhu2023ponderv2,
  title={PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm}, 
  author={Haoyi Zhu and Honghui Yang and Xiaoyang Wu and Di Huang and Sha Zhang and Xianglong He and Tong He and Hengshuang Zhao and Chunhua Shen and Yu Qiao and Wanli Ouyang},
  journal={arXiv preprint arXiv:2310.08586},
  year={2023}
}

@inproceedings{huang2023ponder,
  title={Ponder: Point cloud pre-training via neural rendering},
  author={Huang, Di and Peng, Sida and He, Tong and Yang, Honghui and Zhou, Xiaowei and Ouyang, Wanli},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={16089--16098},
  year={2023}
}

@article{yang2023unipad,
  title={UniPAD: A Universal Pre-training Paradigm for Autonomous Driving}, 
  author={Honghui Yang and Sha Zhang and Di Huang and Xiaoyang Wu and Haoyi Zhu and Tong He and Shixiang Tang and Hengshuang Zhao and Qibo Qiu and Binbin Lin and Xiaofei He and Wanli Ouyang},
  journal={arXiv preprint arXiv:2310.08370},
  year={2023},
}

Acknowledgement

This project is mainly based on the following codebases. Thanks for their great works!