You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hello, author, great thanks for your great work, but I'm a rookie of TAD, I have read over your code, but I can't understand that th model just output the start, end, confidence of a action, but not classify them, why it be called TAD?
My foolish understand of TAD all step:
first: train a video recognition model to extract feature for a untrimmed video
second: train a proposal model to get some proposal
last: use a video recognition model to classify proposal
am I right? Dose the model of last step and first step is the same one?
Thank you and wait for your reply!
Very gratefull.
The text was updated successfully, but these errors were encountered:
Your understanding is correct. It is OK to divide the pipeline into the three stages as your mentioned, but not necessary. Indeed, I was also thinking of these questions in the past year.
(a) Why do we have to extract the feature? And why do we have to use an action recognition model to provide the feature for localization?
(b) The definition of temporal action proposal came after the action localization problem. Action proposal is an intermediate product of action detection or localization. Maybe it is put more effort into the problem itself, and less effort into proposal generation.
(c) Video-level classification for actions plays an essential role in this problem. It should be interesting to see how snippet-level information help to classify the actions.
If you are also thinking about 1., you can answer your second question. In my view, the first step and the last step should not be the same model because the first step should provide video representation for the detection problem, but the last step should recognise the action.
TAD is a fast-developing field. Currently, many work are on step 2, several on step 3, but few on step 1.
Hi @frostinassiky,
Thanks for your wonderful work. I have read your code, but I didn't figure out how you classified the proposals generated by your framework. Could you share with us the details? Especially, how did you generate cuhk_val_simp_share.json and uNet_test.npy? What does they mean?
Thanks again and look forward to your reply.
hello, author, great thanks for your great work, but I'm a rookie of TAD, I have read over your code, but I can't understand that th model just output the start, end, confidence of a action, but not classify them, why it be called TAD?
My foolish understand of TAD all step:
first: train a video recognition model to extract feature for a untrimmed video
second: train a proposal model to get some proposal
last: use a video recognition model to classify proposal
am I right? Dose the model of last step and first step is the same one?
Thank you and wait for your reply!
Very gratefull.
The text was updated successfully, but these errors were encountered: