This repository is the official implementation of MambaPlace 🔥🔥🔥
In future smart cities, autonomous vehicles, drones, and intelligent logistics systems will rely heavily on accurate localization from human language descriptions for effective path planning. Traditional visual place recognition (VPR) methods, which depend on cameras or radar to extract features from 2D images or point clouds, struggle with efficiency in human-computer interaction and lack precision under varying environmental conditions. A promising alternative is the text-to-point-cloud localization approach, which enables accurate localization without requiring proximity to the location and is resilient to changes in the natural environment. However, this method faces challenges such as ambiguous language descriptions and similar descriptions for different positions within the same region. Existing solutions, like Text2Pos and Text2loc, have made progress but still fall short in fully integrating multimodal data. To address these issues, we propose the Mamba model, a unified approach using Selective State Space Models (SSM) to enhance feature representation and improve localization accuracy.
Create a conda environment and install basic dependencies:
git clone https://github.com/nuozimiaowu/MambaPlace
cd MambaPlace
conda create -n mambaplace python=3.10
conda activate mambaplace
# Install the according versions of torch and torchvision
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
# Install required dependencies
CC=/usr/bin/gcc-9 pip install -r requirements.txt
The KITTI360Pose dataset is used in our implementation.
For training and evaluation, we need cells and poses from Kitti360Pose dataset. The cells and poses folder can be downlowded from HERE
In addtion, to successfully implement prototype-based map cloning, we need to know the neighbors of each cell. We use direction folder to store the adjacent cells in different directions. The direction folder can be downloaded from HERE
If you want to train the model, you need to download the pretrained object backbone HERE:
The KITTI360Pose and the pretrained object backbone is provided by Text2Pos (paper, code)
The final directory structure should be:
│Text2Loc/
├──dataloading/
├──datapreparation/
├──data/
│ ├──k360_30-10_scG_pd10_pc4_spY_all/
│ ├──cells/
│ ├──2013_05_28_drive_0000_sync.pkl
│ ├──2013_05_28_drive_0002_sync.pkl
│ ├──...
│ ├──poses/
│ ├──2013_05_28_drive_0000_sync.pkl
│ ├──2013_05_28_drive_0002_sync.pkl
│ ├──...
│ ├──direction/
│ ├──2013_05_28_drive_0000_sync.json
│ ├──2013_05_28_drive_0002_sync.json
│ ├──...
├──checkpoints/
│ ├──pointnet_acc0.86_lr1_p256.pth
├──...
After setting up the dependencies and dataset, our models can be trained using the following commands:
python -m training.coarse --batch_size 64 --coarse_embed_dim 256 --shuffle --base_path ./data/k360_30-10_scG_pd10_pc4_spY_all/ \
--use_features "class" "color" "position" "num" \
--no_pc_augment \
--fixed_embedding \
--epochs 20 \
--learning_rate 0.0005 \
--lr_scheduler step \
--lr_step 7 \
--lr_gamma 0.4 \
--temperature 0.1 \
--ranking_loss contrastive \
--hungging_model t5-large \
--folder_name PATH_TO_COARSE
python -m training.fine --batch_size 32 --fine_embed_dim 128 --shuffle --base_path ./data/k360_30-10_scG_pd10_pc4_spY_all/ \
--use_features "class" "color" "position" "num" \
--no_pc_augment \
--fixed_embedding \
--epochs 35 \
--learning_rate 0.0003 \
--fixed_embedding \
--hungging_model t5-large \
--regressor_cell all \
--pmc_prob 0.5 \
--folder_name PATH_TO_FINE
python -m evaluation.pipeline --base_path ./data/k360_30-10_scG_pd10_pc4_spY_all/ \
--use_features "class" "color" "position" "num" \
--no_pc_augment \
--no_pc_augment_fine \
--hungging_model t5-large \
--fixed_embedding \
--path_coarse ./checkpoints/{PATH_TO_COARSE}/{COARSE_MODEL_NAME} \
--path_fine ./checkpoints/{PATH_TO_FINE}/{FINE_MODEL_NAME}
python -m evaluation.pipeline --base_path ./data/k360_30-10_scG_pd10_pc4_spY_all/ \
--use_features "class" "color" "position" "num" \
--use_test_set \
--no_pc_augment \
--no_pc_augment_fine \
--hungging_model t5-large \
--fixed_embedding \
--path_coarse ./checkpoints/{PATH_TO_COARSE}/{COARSE_MODEL_NAME} \
--path_fine ./checkpoints/{PATH_TO_FINE}/{FINE_MODEL_NAME}
We borrowed some code from Textpos and Text2Loc, and we would like to thank them for their help!
@InProceedings{xia2024text2loc, title={Text2Loc: 3D Point Cloud Localization from Natural Language}, author={Xia, Yan and Shi, Letian and Ding, Zifeng and Henriques, Jo{~a}o F and Cremers, Daniel}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year={2024} }