Code for paper ‘Multi-unit stacked architecture: An urban scene segmentation network based on UNet and ShuffleNetv2’
- python 3.8
- pytorch 1.11.0
- Cuda 11.3
*From left to right are input images, ground truth, segmentation outputs.
4.Ablation study results of MSA-Net on Cityscapes test dataset and enhanced PASCAL VOC 2012 val dataset.
Index | Baseline | DLED | ESCC | MSIC | Cityscapes | VOC 2012 Augment | Params |
---|---|---|---|---|---|---|---|
1 | ✓ | 63.3 | 45.5 | 31.0M | |||
2 | ✓ | ✓ | 65.5 | 58.2 | 7.0M | ||
3 | ✓ | ✓ | ✓ | 72.6 | 64.2 | 7.1M | |
4 | ✓ | ✓ | ✓ | ✓ | 73.6 | 65.3 | 7.6M |
*Performed on a single RTX 4090 GPU
*Note that the format {., ., ., .,} represents the channel depth in encoder of MSA-Net, and the channel depth in the decoder and encoder are symmetric. r represents the channel compression ratio.
Model | Channel depth | mIoU | Params | GFLOPs | FPS |
---|---|---|---|---|---|
MSA-Net | {64, 128, 256, 512, 1024 | r = 1} | 74.7 | 7.6M | 43.7 | 31.0 |
MSA-Net-Middle | {32, 64, 128, 256, 512 | r = 0.5} | 72.0 | 1.9M | 11.4 | 33.7 |
MSA-Net-Slim | {16, 32, 64, 128, 256 | r = 0.25} | 63.8 | 0.5M | 3.1 | 36.3 |