We release the pre-trained models trained on Moments in Time.
- Clone the code from Github:
git clone https://github.com/metalbubble/moments_models.git
cd moments_models
- RGB model in PyTorch (ResNet50 pretrained on ImageNet). Run the following script to download and run the test sample. The model is tested sucessfully in PyTorch 1.0 + python36.
python test_model.py
We provide a 3D ResNet50 (inflated from 2D RGB model) trained on 16 frame inputs at 5 fps.
The model has been recently updated with 305 classes and the following performance on the MiT-V2 dataset:
Top-1 | Top-5 |
---|---|
28.4% | 54.5% |
The 3D model can be downloaded and run using a similar command:
python test_video.py --video_file path/to/video.mp4 --arch resnet3d50
If you use any of these files please cite our Moments paper (https://arxiv.org/abs/1801.03150).
We now include the Multi-label Moments (M-MiT) 3D Resnet50 Model, Broden dataset with action regions and loss implementations including wLSEP. If you use any of these files please cite our Multi Moments paper (https://arxiv.org/abs/1911.00232).
The multi-label model has been recently updated with 305 classes and the following performance on the M-MiT-V2 dataset:
Top-1 | Top-5 | micro mAP | macro mAP |
---|---|---|---|
59.4% | 81.7% | 62.4 | 39.4 |
The 3D M-MiT model can be downloaded and run using the following command:
python test_video.py --video_file path/to/video.mp4 --arch resnet3d50 --multi
We uploaded a python file with our pytorch implementations of the different loss functions used in our Multi Moments paper (https://arxiv.org/abs/1911.00232).
In order to NetDissect Moments models, download the Broden datasets with action regions:
- Broden (224x224)
- Broden (227x227)
- Broden (384x384) Note: these can be used with the PyTorch NetDissect code without modification.
-
Dynamic Image model in Caffe: use the testing script.
-
TRN models is at this repo. To use the TRN model trained on Moments:
Clone the TRN repo and Download the pretrained TRN model
git clone --recursive https://github.com/metalbubble/TRN-pytorch
cd TRN-pytorch/pretrain
./download_models.sh
cd ../sample_data
./download_sample_data.sh
Test the pretrained model on the sample video (Bolei is juggling ;-]!)
python test_video.py --arch InceptionV3 --dataset moments \
--weight pretrain/TRN_moments_RGB_InceptionV3_TRNmultiscale_segment8_best.pth.tar \
--frame_folder sample_data/bolei_juggling
RESULT ON sample_data/bolei_juggling
0.982 -> juggling
0.003 -> flipping
0.003 -> spinning
Mathew Monfort, Alex Andonian, Bolei Zhou, Kandan Ramakrishnan, Sarah Adel Bargal, Tom Yan, Lisa Brown, Quanfu Fan, Dan Gutfruend, Carl Vondrick, Aude Oliva. Moments in Time Dataset: one million videos for event understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019. pdf, bib
Mathew Monfort, Kandan Ramakrishnan, Alex Andonian, Barry A McNamara, Alex Lascelles, Bowen Pan, Quanfu Fan, Dan Gutfreund, Rogerio Feris, Aude Oliva. Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding. arxiv preprint arXiv:1911.00232, 2019. pdf, bib
The project is supported by MIT-IBM Watson AI Lab, IBM Research, the SystemsThatLearn@CSAIL / Ignite Grant and the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/ Interior Business Center (DOI/IBC) contract number D17PC00341.