diff --git a/README.md b/README.md index 03e2c71..9d18cf6 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,7 @@ Official repository for the paper [Robust High-Resolution Video Matting with Temporal Guidance](https://peterl1n.github.io/RobustVideoMatting/). RVM is specifically designed for robust human video matting. Unlike existing neural models that process frames as independent images, RVM uses a recurrent neural network to process videos with temporal memory. RVM can perform matting in real-time on any videos without additional inputs. It achieves **4K 76FPS** and **HD 104FPS** on an Nvidia GTX 1080 Ti GPU. The project was developed at [ByteDance Inc.](https://www.bytedance.com/)
+ ## News @@ -34,7 +35,7 @@ All footage in the video are available in [Google Drive](https://drive.google.co ## Demo * [Webcam Demo](https://peterl1n.github.io/RobustVideoMatting/#/demo): Run the model live in your browser. Visualize recurrent states. * [Colab Demo](https://colab.research.google.com/drive/10z-pNKRnVNsp0Lq9tH1J_XPZ7CBC_uHm?usp=sharing): Test our model on your own videos with free GPU. - +* [Replicate Demo](https://replicate.com/arielreplicate/robust_video_matting): Test our model on Replicate UI/python API.
## Download diff --git a/cog.yaml b/cog.yaml new file mode 100644 index 0000000..e7a4f3f --- /dev/null +++ b/cog.yaml @@ -0,0 +1,14 @@ +build: + gpu: true + python_version: 3.8 + system_packages: + - libgl1-mesa-glx + - libglib2.0-0 + python_packages: + - torch==1.9.0 + - torchvision==0.10.0 + - av==8.0.3 + - tqdm==4.61.1 + - pims==0.5 + +predict: "predict.py:Predictor" diff --git a/predict.py b/predict.py new file mode 100644 index 0000000..d7a9707 --- /dev/null +++ b/predict.py @@ -0,0 +1,32 @@ +import torch +from model import MattingNetwork +from inference import convert_video + +from cog import BasePredictor, Path, Input + + +class Predictor(BasePredictor): + def setup(self): + self.model = MattingNetwork('resnet50').eval().cuda() + self.model.load_state_dict(torch.load('rvm_resnet50.pth')) + + def predict( + self, + input_video: Path = Input(description="Video to segment."), + output_type: str = Input(default="green-screen", choices=["green-screen", "alpha-mask", "foreground-mask"]), + + ) -> Path: + + convert_video( + self.model, # The model, can be on any device (cpu or cuda). + input_source=str(input_video), # A video file or an image sequence directory. + output_type='video', # Choose "video" or "png_sequence" + output_composition='green-screen.mp4', # File path if video; directory path if png sequence. + output_alpha="alpha-mask.mp4", # [Optional] Output the raw alpha prediction. + output_foreground="foreground-mask.mp4", # [Optional] Output the raw foreground prediction. + output_video_mbps=4, # Output video mbps. Not needed for png sequence. + downsample_ratio=None, # A hyperparameter to adjust or use None for auto. + seq_chunk=12, # Process n frames at once for better parallelism. + ) + output_type = str(output_type) + return Path(f'{output_type}.mp4')