Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what is the meaning of path_num? #54

Open
zhanghaochen-817 opened this issue Aug 14, 2023 · 7 comments
Open

what is the meaning of path_num? #54

zhanghaochen-817 opened this issue Aug 14, 2023 · 7 comments

Comments

@zhanghaochen-817
Copy link

I cannot understand the meaning of the setting of the path_num。
for multi-GPU?
waiting for your reply senserly.

@lpetflo
Copy link

lpetflo commented Aug 14, 2023

The path_num variable represents the number of paths used for the model. When running the td4_psp18 model, you're essentially running 4 psp18 models, for td2_psp50, you're using 2 psp50 models.
@feinanshan also commented on this issue and explained their approach in #3

Was this clear enough? You mentioned a fix for #52, could you give me some hints on this as well?

@zhanghaochen-817
Copy link
Author

Thanks for your reply. I know how it works. But I cannot understand why use two(four) path_num, if I use one, What impact will it have? Waiting for your reply sinserly.

@lpetflo
Copy link

lpetflo commented Aug 15, 2023

I'm not sure, that I can correctly explain how it works, but regarding the "why" have a look at Table 7 in the original paper and at the paragraph "Shared Subnetworks v.s. Independent Subnetworks". Splitting up the network into multiple paths seems to enlargen the representation capacity and therefore improve the accuracy of the model.

@zhanghaochen-817
Copy link
Author

thanks for your reply. In table 7 ,there is three options. 1.Single Path Baseline 2.Shared 3.Independent, the miu gradually raising.
Under the above premise, I have two questions.
1.I think the "Independent" representing two(four) path_num , but I cannot understand the difference between "single path baseline" and "shared" , I think "single path baseline" and "shared" both representing one path_num. Can you explain the difference between them?
2. In your code (such as td2-fa18), the difference between two path_num only reflect in “self.head” structure, other strcture(such as "self.pretrained","self.ffm","self.enc","self.atn") in two path_num have no difference ,they will update parameters in a shared manner. I'm not sure if I expressed myself clearly and if my understanding is correct. Waiting for your reply sinserly

@feinanshan
Copy link
Owner

feinanshan commented Aug 16, 2023

Hi, thanks for your interest. The key idea is to separate and distribute a large segmentation model's computation into multiple independent lightweight subnetworks, which can be assigned to different frames so as to reduce the latency.

  1. Single Path Baseline means a per-frame model, while Shared represents a TDNet with the same feature extractor for different frames. Note that in Shared , features are extracted from multiple frames.
  2. They share the same structure. Yet note that their inputs are different, and also the KD loss can drive the parameters to be different.

@zhanghaochen-817
Copy link
Author

Thanks for your reply. Through your explanation I understand the meaning of table 7. But as for reply 2, I think which path will the data go into depends on epoch and pah_num. So I always believe that this bath_size go into path 1 , then the next bath_size go into path 2 , and next go into path1, and repeat it continuously based on this. so I think this training method will train two different parameters for KDloss and parameters of "self.head" functions.
I really dont know why ,but through your explanation, I understand it really improve the accuracy and miou , mybe I need to continue to delve into your paper and code. And thank you for your patient response. Have a nice day!

@zhanghaochen-817
Copy link
Author

I'm sorry, there are still some questions I need to ask you.
In your paper, after "Encoding" model connected to "downsampling" modules ,but in td4-psp18 model , your code indicates that maxpooling downsampling was performed on the features first(in ./Training/ptsemseg/model/td4-psp/transformer.py , line 36) , and your paper prove through experiments that maxpooling's kernel_size use 4 can achieve the best result, in your code , it only has 3.
Please don't mind my tireless inquiries, sinserly waiting for your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants