Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about TBD-Baseline and detection/tracking performance #6

Open
JingweiZhang12 opened this issue Aug 7, 2024 · 1 comment

Comments

@JingweiZhang12
Copy link

JingweiZhang12 commented Aug 7, 2024

Thanks for your work and open-source code! I have some questions:

  1. From my perspective, the differences between TBD-Baseline and ADA-Track are mainly whether the track-query & det-query perform self-attention simultaneously in the shared decoder layer and it mainly influences the detection performance, which fuses temporal information, and then it increases MOTA metric indirectly.
  2. As we know, MUTR3D is based on MOTRv1. Did you try the tricks of MOTRv3, such as better assignment for enhancing detection performance, on MUTR3D?
  3. Regarding the DETR3D/PETR detector, have you tested whether adding association layers has any impact on detection performance

I'd appreciate it if you could answer the above questions. @dsx0511

@JingweiZhang12 JingweiZhang12 changed the title Some questions about TBD-Baseline Some questions about TBD-Baseline and detection/tracking performance Aug 9, 2024
@dsx0511
Copy link
Owner

dsx0511 commented Dec 7, 2024

Hi @JingweiZhang12 , sorry for the late reply. Here are the answers to your questions:

  1. The model architecture of ADA-Track is
for decoder_layer:
    [track_query, detection_query] = self_attention([track_query, detection_query]) # '[]' denotes concatenation 
    [track_query, detection_query] = cross_attention([track_query, detection_query], image_features)
    intermediate_track_boxes = MLP(track_query)
    intermediate_track_boxes = MLP(detection_query)
    detection_query, edge_features = edge_augmented_cross_attention(track_query, detection_query, edge_features) # track query: key, detection_query: query

The model architecture of TBD-Baseline is

for detection_decoder_layer:
    [track_query, detection_query] = self_attention([track_query, detection_query])
    [track_query, detection_query] = cross_attention([track_query, detection_query], image_features)
    intermediate_track_boxes = MLP(track_query)
    intermediate_track_boxes = MLP(detection_query)

for association_decoder_layer:
    detection_query, edge_features = edge_augmented_cross_attention(track_query, detection_query, edge_features)

Therefore, the differences between ADA-Track and TBD-Baseline do not only lie in self-attention. It also leads to differences in the association's performance: The learned association modules in ADA-Track fuse detection information with previous association results layer by layer, yielding a mutual optimization of both tasks. In contrast, the association layers of TBD-Baseline cannot influence the detection layers in the forward pass.

2/3. We have not tried it yet. But thank you for your advice and it is very valuable for our future work!

Hopefully, the response is not too late for you. And I'm looking forward to further discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants