Skip to content

nuozimiaowu/MambaPlace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MambaPlace

This repository is the official implementation of MambaPlace 🔥🔥🔥

Introduction

In future smart cities, autonomous vehicles, drones, and intelligent logistics systems will rely heavily on accurate localization from human language descriptions for effective path planning. Traditional visual place recognition (VPR) methods, which depend on cameras or radar to extract features from 2D images or point clouds, struggle with efficiency in human-computer interaction and lack precision under varying environmental conditions. A promising alternative is the text-to-point-cloud localization approach, which enables accurate localization without requiring proximity to the location and is resilient to changes in the natural environment. However, this method faces challenges such as ambiguous language descriptions and similar descriptions for different positions within the same region. Existing solutions, like Text2Pos and Text2loc, have made progress but still fall short in fully integrating multimodal data. To address these issues, we propose the Mamba model, a unified approach using Selective State Space Models (SSM) to enhance feature representation and improve localization accuracy.

Structure overview

image

Experimental performance

image

Installation

Create a conda environment and install basic dependencies:

git clone https://github.com/nuozimiaowu/MambaPlace
cd MambaPlace

conda create -n mambaplace python=3.10
conda activate mambaplace

# Install the according versions of torch and torchvision
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch

# Install required dependencies
CC=/usr/bin/gcc-9 pip install -r requirements.txt

Datasets & Backbone

The KITTI360Pose dataset is used in our implementation.

For training and evaluation, we need cells and poses from Kitti360Pose dataset. The cells and poses folder can be downlowded from HERE

In addtion, to successfully implement prototype-based map cloning, we need to know the neighbors of each cell. We use direction folder to store the adjacent cells in different directions. The direction folder can be downloaded from HERE

If you want to train the model, you need to download the pretrained object backbone HERE:

The KITTI360Pose and the pretrained object backbone is provided by Text2Pos (paper, code)

The final directory structure should be:

│Text2Loc/
├──dataloading/
├──datapreparation/
├──data/
│   ├──k360_30-10_scG_pd10_pc4_spY_all/
│       ├──cells/
│           ├──2013_05_28_drive_0000_sync.pkl
│           ├──2013_05_28_drive_0002_sync.pkl
│           ├──...
│       ├──poses/
│           ├──2013_05_28_drive_0000_sync.pkl
│           ├──2013_05_28_drive_0002_sync.pkl
│           ├──...
│       ├──direction/
│           ├──2013_05_28_drive_0000_sync.json
│           ├──2013_05_28_drive_0002_sync.json
│           ├──...
├──checkpoints/
│   ├──pointnet_acc0.86_lr1_p256.pth
├──...

Train

After setting up the dependencies and dataset, our models can be trained using the following commands:

Train Global Place Recognition (Coarse)

python -m training.coarse --batch_size 64 --coarse_embed_dim 256 --shuffle --base_path ./data/k360_30-10_scG_pd10_pc4_spY_all/   \
  --use_features "class"  "color"  "position"  "num" \
  --no_pc_augment \
  --fixed_embedding \
  --epochs 20 \
  --learning_rate 0.0005 \
  --lr_scheduler step \
  --lr_step 7 \
  --lr_gamma 0.4 \
  --temperature 0.1 \
  --ranking_loss contrastive \
  --hungging_model t5-large \
  --folder_name PATH_TO_COARSE

Train Fine Localization

python -m training.fine --batch_size 32 --fine_embed_dim 128 --shuffle --base_path ./data/k360_30-10_scG_pd10_pc4_spY_all/ \
  --use_features "class"  "color"  "position"  "num" \
  --no_pc_augment \
  --fixed_embedding \
  --epochs 35 \
  --learning_rate 0.0003 \
  --fixed_embedding \
  --hungging_model t5-large \
  --regressor_cell all \
  --pmc_prob 0.5 \
  --folder_name PATH_TO_FINE

Evaluation

Evaluation on Val Dataset

python -m evaluation.pipeline --base_path ./data/k360_30-10_scG_pd10_pc4_spY_all/ \
    --use_features "class"  "color"  "position"  "num" \
    --no_pc_augment \
    --no_pc_augment_fine \
    --hungging_model t5-large \
    --fixed_embedding \
    --path_coarse ./checkpoints/{PATH_TO_COARSE}/{COARSE_MODEL_NAME} \
    --path_fine ./checkpoints/{PATH_TO_FINE}/{FINE_MODEL_NAME} 

Evaluation on Test Dataset

python -m evaluation.pipeline --base_path ./data/k360_30-10_scG_pd10_pc4_spY_all/ \
    --use_features "class"  "color"  "position"  "num" \
    --use_test_set \
    --no_pc_augment \
    --no_pc_augment_fine \
    --hungging_model t5-large \
    --fixed_embedding \
    --path_coarse ./checkpoints/{PATH_TO_COARSE}/{COARSE_MODEL_NAME} \
    --path_fine ./checkpoints/{PATH_TO_FINE}/{FINE_MODEL_NAME} 

Acknowledgemengt:

We borrowed some code from Textpos and Text2Loc, and we would like to thank them for their help!

@InProceedings{xia2024text2loc, title={Text2Loc: 3D Point Cloud Localization from Natural Language}, author={Xia, Yan and Shi, Letian and Ding, Zifeng and Henriques, Jo{~a}o F and Cremers, Daniel}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year={2024} }

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages