first commit

ZEROICEWANG · Jun 19, 2024 · 5339319 · 5339319
commit 5339319
Show file tree

Hide file tree

Showing 39 changed files with 5,649 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1 @@
+./models
diff --git a/README.en.md b/README.en.md
@@ -0,0 +1,36 @@
+# MEPNet
+
+#### Description
+{**When you're done, you can delete the content in this README and update the file with details for others getting started with your repository**}
+
+#### Software Architecture
+Software architecture description
+
+#### Installation
+
+1.  xxxx
+2.  xxxx
+3.  xxxx
+
+#### Instructions
+
+1.  xxxx
+2.  xxxx
+3.  xxxx
+
+#### Contribution
+
+1.  Fork the repository
+2.  Create Feat_xxx branch
+3.  Commit your code
+4.  Create Pull Request
+
+
+#### Gitee Feature
+
+1.  You can use Readme\_XXX.md to support different languages, such as Readme\_en.md, Readme\_zh.md
+2.  Gitee blog [blog.gitee.com](https://blog.gitee.com)
+3.  Explore open source project [https://gitee.com/explore](https://gitee.com/explore)
+4.  The most valuable open source project [GVP](https://gitee.com/gvp)
+5.  The manual of Gitee [https://gitee.com/help](https://gitee.com/help)
+6.  The most popular members  [https://gitee.com/gitee-stars/](https://gitee.com/gitee-stars/)
diff --git a/README.md b/README.md
@@ -0,0 +1,64 @@
+# [MEPNet: Mask Prior-distribution Interaction and Edge Probability Estimation for Salient Object Detection]
+
+by XX
+
+## Introduction
+Salient Object Detection (SOD) has been researched extensively and achieved impressive performance. However, current methods still present defective results while facing complex scenes. Because the deceptive background hinders the methods from distinguishing the salient subject and presenting discriminative edges. To address this issue, we propose a method named MEPNet to realize two tasks: interacting the mask prior-distribution with multi-scale feature to suppress the background disturbance; and estimating the edge probability of salient objects to boost the detail of decoding results. The proposed method MEPNet is mainly composed of four kinds of modules, including Lite Receptive Filed Block (RFB-Lite) module, Prior Query (PQ) module, Full-scale sub-Decoder (FD) module, and Edge Auxiliary (EA) module. The RFB-Lite adopts the multi-scale convolution with grouped branches to efficiently reduce channel redundancy and enhance the diversity of semantics. The PQ introduces the mask prior-distribution into the fused feature by multi-head cross-attention. To solve the conflict between the number of attention heads and the computational complexity in cross-attention, Multi-head L-Cross Attention (MLAC) is proposed to self-weight the feature while calculating the attention score matrix and global attention. The FD realizes the bi-direction decoding with feature pyramid network (PFN) and reversed feature pyramid network (FPN-R). Three FDs adopted in MEPNet present a multi-basis decoding to fully utilize multi-scale features. The alternating use of PQ and FD ensures the suppression of background disturbance. The EA, as the last part of MEPNet, boosts the decoding results with edge filter estimating the edge probability and edge refine correcting the up-sampled decoding results. The SOD experimental results on DUTS-TE, HKU-IS, PASCAL-S, ECSSD, and DUT-OMRON datasets demonstrate that the proposed MEPNet is more robust under different complex scenes when compared to some state-of-the-art (SOTA) methods.
+
+
+## Prerequisites
+- [Python 3.6](https://www.python.org/)
+- [Pytorch 1.10](http://pytorch.org/)
+- [OpenCV 4.5.5.64](https://opencv.org/)
+- [Numpy 1.19.5](https://numpy.org/)
+- [pillow 8.4.0](https://pypi.org/project/Pillow/)
+- [timm 0.4.12](https://pypi.org/project/timm/0.4.12/)
+- [tqdm 4.64.0](https://pypi.org/project/tqdm/4.64.0/)
+
+
+## Clone repository
+
+```shell
+git clone https://github.com/ZEROICEWANG/MEPNet.git
+cd MEPNet/
+```
+
+## Download dataset
+
+Download the following datasets and unzip them into `../SOD_Data` folder
+
+- [PASCAL-S](http://cbi.gatech.edu/salobj/)
+- [ECSSD](http://www.cse.cuhk.edu.hk/leojia/projects/hsaliency/dataset.html)
+- [HKU-IS](https://i.cs.hku.hk/~gbli/deep_saliency.html)
+- [DUT-OMRON](http://saliencydetection.net/dut-omron/)
+- [DUTS](http://saliencydetection.net/duts/)
+
+
+## Download model
+
+- If you want to test the performance of MEPNet, please download the model([Baidu](https://pan.baidu.com/s/1S_uwKUEUIoRMw-p9Ek28zA?pwd=6jyz) [Google](https://drive.google.com/file/d/1-2gtqk9M3Ex9Ou_YtOSFsQmvPWkcOySg/view?usp=sharing)) into `models/RES_Model` folder, and download the pretrained model ([Baidu](https://pan.baidu.com/s/1Lh-MrKSLU1rG6DL45PiqtQ?pwd=pfvi) [Google](https://drive.google.com/file/d/1Y8d2cSZh71oKd4qQ56TTK_sYvhTIHe8G/view?usp=sharing)) into `models/pretrained/prior_query2_channel_last_L_attention_rm_SA/pretrain` folder.
+
+
+## Training
+
+```shell
+    python3 train.py # or using bash cmd_train.sh
+```
+
+
+## Testing
+
+```shell
+    python3 predict.py # or using bash cmd.sh
+```
+- After testing, saliency maps of `PASCAL-S`, `ECSSD`, `HKU-IS`, `DUT-OMRON`, `DUTS-TE` will be saved in `predict_result/` folder.
+
+## Saliency maps & Pre-Trained model & Trained model
+- saliency maps: [Baidu](https://pan.baidu.com/s/1EEqGAK5KU-Frpsvx9GKFDw?pwd=by1m) [Google](https://drive.google.com/file/d/16WoEBpne1mpsa_NEQ1ffFG9v7ZnppqxR/view?usp=sharing)
+
+- pretrained model: [Baidu](https://pan.baidu.com/s/1Lh-MrKSLU1rG6DL45PiqtQ?pwd=pfvi) [Google](https://drive.google.com/file/d/1Y8d2cSZh71oKd4qQ56TTK_sYvhTIHe8G/view?usp=sharing)
+
+- trained model: [Baidu](https://pan.baidu.com/s/1S_uwKUEUIoRMw-p9Ek28zA?pwd=6jyz) [Google](https://drive.google.com/file/d/1-2gtqk9M3Ex9Ou_YtOSFsQmvPWkcOySg/view?usp=sharing)
+
+
+
diff --git a/cmd.sh b/cmd.sh
@@ -0,0 +1,5 @@
+
+python sleep.py --base 0 --subbase 7.8  --iter 0
+dir=($(ls -A ./models/RES_Model/))
+
+python predict.py --name ${dir[-1]} --model CPD_RES_PA --config-file 'config/standard.yaml'
diff --git a/cmd_train.sh b/cmd_train.sh
@@ -0,0 +1,6 @@
+# echo 1 > /proc/sys/vm/drop_caches
+# echo 2 > /proc/sys/vm/drop_caches
+# echo 3 > /proc/sys/vm/drop_caches
+
+python sleep.py --base 0 --subbase 0 --iter 0
+python -m torch.distributed.launch --nproc_per_node 2 train.py --config-file ./config/standard.yaml  --gpus '0,1'
diff --git a/config/__init__.py b/config/__init__.py
@@ -0,0 +1 @@
+from .params import _C as cfg
diff --git a/config/params.py b/config/params.py
@@ -0,0 +1,122 @@
+import os
+from yacs.config import CfgNode as CN
+
+_C = CN()
+_C.gpus='0,1'
+_C.local_rank=-1
+_C.config_file=''
+_C.global_rank=-1
+_C.world_size=1
+_C.sync_bn=True
+_C.seed=1000
+_C.print_rate=100
+_C.empty_cache=False
+_C.BalancedData=False
+_C.model=CN()
+_C.model.mid_channel=256
+_C.model.expand=[1.0,1.0,1.0]
+_C.model.using_split=False
+_C.model.neck=CN()
+_C.model.neck.type='RFB_Conv'
+_C.model.neck.using_PA=False
+_C.model.neck.using_CA=False
+_C.model.Edge_Ass=CN()
+_C.model.Edge_Ass.type='EdgePro_Auxiliary'
+_C.model.Edge_Ass.using_probability=True
+_C.model.Edge_Ass.using_canny=False
+_C.model.Edge_Ass.using=False
+_C.model.Edge_Ass.using_PV=False
+_C.model.Edge_Ass.using_edge_SC=False
+_C.model.Edge_Ass.inchannel=4 if _C.model.Edge_Ass.using_PV else 3
+_C.model.Edge_Ass.PV_reduction='max'
+_C.model.Edge_Ass.PV_patch_size=3
+_C.model.Edge_Ass.PV_normal=False
+_C.model.Edge_Ass.mid_channel=4
+_C.model.Edge_Ass.gain=2.0
+
+
+_C.model.PQ=CN()
+_C.model.PQ.type='patch_query2_swin_channel_last.prior_query'
+_C.model.PQ.using=False
+_C.model.PQ.patch_size=2
+_C.model.PQ.map_size=16
+_C.model.PQ.num_head=4
+_C.model.PQ.using_pretrain=False
+_C.model.PQ.position=[True,False,False]
+_C.model.PQ.pretrain_path='./models/pretrained/prior_query2_channel_last_L_attention_rm_SA/pretrain/model_199.pth'
+_C.model.PQ.using_LA=True
+_C.model.PQ.using_prior=True
+
+_C.model.SR=CN() #salient object reconstruction
+_C.model.SR.using=False
+_C.model.header='my_aggregation_lite_CSP'
+
+_C.model.EQ=CN()
+_C.model.EQ.using=False
+_C.model.EQ.map_size=88
+_C.model.EQ.win_size=4
+_C.model.EQ.num_heads=4
+
+_C.model.AF=CN()
+_C.model.AF.using=False
+_C.model.AF.position=[True,False,False]
+
+
+_C.dataloader=CN()
+_C.dataloader.batch_size=16
+_C.dataloader.num_work=8
+_C.dataloader.train_size=352
+_C.dataloader.test_size=352
+_C.dataloader.using_random_size=True
+_C.dataloader.train_path=['../SOD_Data/DUTS-TR/DUTS-TR-Image/', '../SOD_Data/DUTS-TR/DUTS-TR-Mask/']
+_C.dataloader.test_path=['../SOD_Data/DUTS-TE/DUTS-TE-Image/', '../SOD_Data/DUTS-TE/DUTS-TE-Mask/']
+
+
+_C.solver=CN()
+_C.solver.type='AdamW'
+_C.solver.epoch=70
+_C.solver.momen=0.9
+_C.solver.weight_decay=1e-5
+_C.solver.lr=1e-4
+_C.solver.base_batchsize=12
+_C.solver.min_lr=_C.solver.lr*0.001
+_C.solver.warmup_lr=_C.solver.lr*0.0001
+_C.solver.init_epoch=10
+_C.solver.warmup_epoch = 3
+_C.solver.t_mul=2
+_C.solver.using_clip=True
+_C.solver.clip=0.5
+_C.solver.lr_rate=[0.1,1,1]
+_C.solver.decay_rate=0.5
+_C.solver.lr_step=[10,30,70]
+
+
+_C.loss=CN()
+_C.loss.combination=['DICE','SSIM', 'WBCE']
+_C.loss.combination_p=[[], [], []]
+_C.loss.loss_weight=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
+_C.loss.loss_scale=CN()
+_C.loss.loss_scale.number=4
+_C.loss.loss_scale.combination=[[1],[1,2],[1,2,3],[1,2,3]]
+_C.loss.using_dice=True
+_C.loss.edge=CN()
+_C.loss.edge.using=_C.model.Edge_Ass.using
+_C.loss.edge.gamma=1.5
+_C.loss.edge.gains=[1,8,64]
+_C.loss.edge.weights=[1,1/4,1/16]
+_C.loss.edge.stage=[10,20,40]
+_C.loss.edge.max_rate=0.9
+_C.loss.edge.base_size=288
+_C.loss.edge.loss_type='mse'
+_C.loss.using_Deepest_loss=False
+_C.loss.using_normal=True
+_C.loss.using_filter_interpolate=False
+_C.loss.mask_binary=False
+_C.loss.AFloss=CN()
+_C.loss.AFloss.weight=0.1
+_C.loss.AFloss.Four=False
+_C.loss.using_Rweight=True
+_C.loss.Rweight=0.5
+
+
+
diff --git a/config/standard.yaml b/config/standard.yaml
@@ -0,0 +1,57 @@
+seed: 5555
+model:
+  mid_channel: 196
+  expand: [2.25, 1.75, 1.25]
+  neck: 
+    type: RFB_Lite
+  Edge_Ass:
+    using: True
+    mid_channel: 4
+    using_PV: False
+    inchannel: 3
+    using_edge_SC: True
+    PV_reduction: None
+    PV_normal: True
+    gain: 1.0
+  header: my_aggregation_lite_CSP
+  PQ:
+    using: True
+    patch_size: 4
+    map_size: 16
+    num_head: 8
+    type: prior_query2_channel_last_L_attention_rm_SA.prior_query
+    using_pretrain: True
+    pretrain_path: models/pretrained/prior_query2_channel_last_L_attention_rm_SA/pretrain/model_199.pth
+    position: [True,True,True]
+
+
+
+loss:
+  combination: ['DICE','SSIM', 'FocalLoss','MSELoss','WBCE','Contrast_Loss','Contrast_Loss','Contrast_Loss','Contrast_Loss']
+  using_Deepest_loss: True
+  combination_p: [[],[],[2],[],[],[0.95],[0.8],[0.65],[0.5]]
+  loss_weight: [0.0625,   0.125,0.125,   0.25,0.25,0.25,   0.5,0.5,0.5,   0.5,0] #sum(combination)+edge_loss
+  loss_scale:
+    number: 4   #11,22,44
+    combination: [[1],[1,2],[1,2,3],[1,2,3]]
+  edge:
+    using: True
+    gamma: 1.5
+    gains: [1,48,64]
+    weights: [1, 0.25, 0.0625] #1,1/4,1/16
+    stage: [10,20,40]
+    max_rate: 0.9
+    base_size: 288
+    loss_type: bce
+  using_normal: True
+
+
+dataloader:
+  batch_size: 12
+
+solver:
+  init_epoch: 10
+  epoch: 70
+  lr_rate: [0.1,1.5,1]
+  decay_rate: 1.0
+  lr_step: [10,30,70]