Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you help me with some question of how to use this repo? Thank you very much! #44

Open
goldlee opened this issue Feb 8, 2021 · 2 comments

Comments

@goldlee
Copy link

goldlee commented Feb 8, 2021

hello, author, great thanks for your great work, but I'm a rookie of TAD, I have read over your code, but I can't understand that th model just output the start, end, confidence of a action, but not classify them, why it be called TAD?
My foolish understand of TAD all step:
first: train a video recognition model to extract feature for a untrimmed video
second: train a proposal model to get some proposal
last: use a video recognition model to classify proposal

am I right? Dose the model of last step and first step is the same one?

Thank you and wait for your reply!
Very gratefull.

@frostinassiky
Copy link
Owner

Hi @goldlee , very good question!

  1. Your understanding is correct. It is OK to divide the pipeline into the three stages as your mentioned, but not necessary. Indeed, I was also thinking of these questions in the past year.
    (a) Why do we have to extract the feature? And why do we have to use an action recognition model to provide the feature for localization?
    (b) The definition of temporal action proposal came after the action localization problem. Action proposal is an intermediate product of action detection or localization. Maybe it is put more effort into the problem itself, and less effort into proposal generation.
    (c) Video-level classification for actions plays an essential role in this problem. It should be interesting to see how snippet-level information help to classify the actions.
  1. If you are also thinking about 1., you can answer your second question. In my view, the first step and the last step should not be the same model because the first step should provide video representation for the detection problem, but the last step should recognise the action.

  2. TAD is a fast-developing field. Currently, many work are on step 2, several on step 3, but few on step 1.

@guuzaa
Copy link

guuzaa commented Aug 24, 2021

Hi @frostinassiky,
Thanks for your wonderful work. I have read your code, but I didn't figure out how you classified the proposals generated by your framework. Could you share with us the details? Especially, how did you generate cuhk_val_simp_share.json and uNet_test.npy? What does they mean?
Thanks again and look forward to your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants