Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li
This is the official implementation of our paper "StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation".
Multimodal semantic segmentation shows significant potential for enhancing segmentation accuracy in complex scenes. However, current methods often incorporate specialized feature fusion modules tailored to specific modalities, thereby restricting input flexibility and increasing the number of training parameters. To address these challenges, we propose StitchFusion, a straightforward yet effective modal fusion framework that integrates large-scale pre-trained models directly as encoders and feature fusers. This approach facilitates comprehensive multi-modal and multi-scale feature fusion, accommodating any visual modal inputs. Specifically, Our framework achieves modal integration during encoding by sharing multi-modal visual information. To enhance information exchange across modalities, we introduce a multi-directional adapter module (MultiAdapter) to enable cross-modal information transfer during encoding. By leveraging MultiAdapter to propagate multi-scale information across pre-trained encoders during the encoding process, StitchFusion achieves multi-modal visual information integration during encoding. Extensive comparative experiments demonstrate that our model achieves state-of-the-art performance on four multi-modal segmentation datasets with minimal additional parameters. Furthermore, the experimental integration of MultiAdapter with existing Feature Fusion Modules (FFMs) highlights their complementary nature.
- 2024/9/20: A researcher has inquired about reproducible. pth files, and we are currently organizing them. However, as the permissions have not been granted to interns, we may need to wait for a period of time. If there is any news, we will make an update as soon as possible.If you have any questions, please contact the author's email: [email protected]
- 2024/9/24: ๆด็ดๆฅ่็ณปๆ็ๆนๅผๆฏ[email protected]๏ผ่ฟๅฐ็ดๆฅๅๅฐๆ็ๆๆบๅฎขๆท็ซฏใ
- If you find this repo is useful, please STAR it that make the authors encouraged.
- stitchfusion_with_tips_you_can_copy.py
- I have updated the reproducible files and made additional versions of StitchFusion available at stitchfusion_with_tips_you_can_copy.py. You can simply copy these files and run the experiments. To use any of these versions, just copy the path of the .pth file into the EVAL/MODEL_PATH field in your chosen config.yaml file.
- I have release the reproducible files for DELIVER dataset, However, during replication, I observed that the results differed slightly from the reported values, with variations of around a few tenthsโsome higher, some lower. Nevertheless, these differences do not affect the overall performance comparison of our model..
- stitchfusion_with_tips_you_can_copy.py is all you need to reproduce the results.
- Recently, some researchers have reported that they are unable to reproduce the.pth files. Please refer to the issue (closed) to organize the code.
- 2024/7/27: init repository.
- 2024/7/27: release the code for StitchFusion.
- 2024/8/02: upload the paper for StitchFusion.
- 2024/11/6๏ผupload some checkpoint file for StitchFuion.
- 2024/11/12: release the reproducible files for DELIVER dataset.
Figure: Comparison of different model fusion paradigms.
Figure: MultiAdapter Module For StitchFusion Framwork At Different Density Levels.
First, create and activate the environment using the following commands:
conda env create -f environment.yaml
conda activate StitchFusion
Download the dataset:
- MCubeS, for multimodal material segmentation with RGB-A-D-N modalities.
- FMB, for FMB dataset with RGB-Infrared modalities.
- PST, for PST900 dataset with RGB-Thermal modalities.
- DeLiver, for DeLiVER dataset with RGB-D-E-L modalities.
- MFNet, for MFNet dataset with RGB-T modalities.
Then, put the dataset under
data
directory as follows:
data/
โโโ MCubeS
โย ย โโโ polL_color
โย ย โโโ polL_aolp_sin
โย ย โโโ polL_aolp_cos
โย ย โโโ polL_dolp
โย ย โโโ NIR_warped
โย ย โโโ NIR_warped_mask
โย ย โโโ GT
โย ย โโโ SSGT4MS
โย ย โโโ list_folder
โย ย โโโ SS
โโโ FMB
โย ย โโโ test
โย ย โย ย โโโ color
โย ย โย ย โโโ Infrared
โย ย โย ย โโโ Label
โย ย โย ย โโโ Visible
โย ย โโโ train
โย ย โย ย โโโ color
โย ย โย ย โโโ Infrared
โย ย โย ย โโโ Label
โย ย โย ย โโโ Visible
โโโ PST
โย ย โโโ test
โย ย โย ย โโโ rgb
โย ย โย ย โโโ thermal
โย ย โย ย โโโ labels
โย ย โโโ train
โย ย โย ย โโโ rgb
โย ย โย ย โโโ thermal
โย ย โย ย โโโ labels
โโโ DELIVER
| โโโ depth
โ โโโ cloud
โ โ โโโ test
โ โ โ โโโ MAP_10_point102
โ โ โ โ โโโ 045050_depth_front.png
โ โ โ โ โโโ ...
โ โ โโโ train
โ โ โโโ val
โ โโโ fog
โ โโโ night
โ โโโ rain
โ โโโ sun
โ โโโ event
โ โโโ hha
โ โโโ img
โ โโโ lidar
โ โโโ semantic
โโโ MFNet
| โโโ img
| โโโ ther
All .pth will release later.
Model-Modal | mIoU | weight |
---|---|---|
StitchFusion-RGB-T | 85.35 | GoogleDrive |
All .pth will release later.
Model-Modal | mIoU | weight |
---|---|---|
StitchFusion-RGB-T | 64.85 | GoogleDrive |
All .pth will release later.
Model-Modal | mIoU | weight |
---|---|---|
StitchFusion-RGB-T | 57.91 | GoogleDrive |
StitchFusion-RGB-T | 57.80 | GoogleDrive |
StitchFusion-RGB-T | 58.13 | GoogleDrive |
All .pth will release later.
Model-Modal | mIoU | weight |
---|---|---|
StitchFusion-RGB-D | 65.75 | GoogleDrive |
StitchFusion-RGB-E | 57.31 | GoogleDrive |
StitchFusion-RGB-L | 58.03 | GoogleDrive |
StitchFusion-RGB-DE | 66.03 | GoogleDrive |
StitchFusion-RGB-DL | 67.06 | GoogleDrive |
StitchFusion-RGB-DEL | 68.18 | GoogleDrive |
Figure: Main Results: Comparision With SOTA Model.
Figure: Main Results: Per-Class Comparision in Different Modality Combination Config and With SOTA Model.
Before training, please download pre-trained SegFormer, and put it in the correct directory following this structure:
checkpoints/pretrained/segformer
โโโ mit_b0.pth
โโโ mit_b1.pth
โโโ mit_b2.pth
โโโ mit_b3.pth
โโโ mit_b4.pth
To train StitchFusion model, please update the appropriate configuration file in configs/
with appropriate paths and hyper-parameters. Then run as follows:
cd path/to/StitchFusion
conda activate StitchFusion
python -m tools.train_mm --cfg configs/mcubes_rgbadn.yaml
python -m tools.train_mm --cfg configs/fmb_rgbt.yaml
python -m tools.train_mm --cfg configs/pst_rgbt.yaml
To evaluate StitchFusion models, please download respective model weights (GoogleDrive) and save them under any folder you like.
Then, update the EVAL
section of the appropriate configuration file in configs/
and run:
cd path/to/StitchFusion
conda activate StitchFusion
python -m tools.val_mm --cfg configs/mcubes_rgbadn.yaml
python -m tools.val_mm --cfg configs/fmb_rgbt.yaml
python -m tools.val_mm --cfg configs/pst_rgbt.yaml
python -m tools.val_mm --cfg configs/deliver.yaml
python -m tools.val_mm --cfg configs/mfnet_rgbt.yaml
Figure: Visulization of StitchFusion On DeLiver Dataset. Figure: Visulization of StitchFusion On Mcubes Dataset.
This repository is under the Apache-2.0 license. For commercial use, please contact with the authors.
@article{li2024stitchfusion,
title={StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation},
author={Li, Bingyu and Zhang, Da and Zhao, Zhiyuan and Gao, Junyu and Li, Xuelong},
journal={arXiv preprint arXiv:2408.01343},
year={2024}
}
Our codebase is based on the following Github repositories. Thanks to the following public repositories:
Note: This is a research level repository and might contain issues/bugs. Please contact the authors for any query.