Skip to content

Latest commit

 

History

History
69 lines (47 loc) · 4.53 KB

benchmark.md

File metadata and controls

69 lines (47 loc) · 4.53 KB

简体中文 | English

Benchmark

We compare our results with some popular frameworks and official releases in terms of speed.

Environment

Hardware

  • 8 NVIDIA Tesla V100 (16G) GPUs
  • Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz

Software

  • Python 3.7
  • PaddlePaddle2.0
  • CUDA 10.1
  • CUDNN 7.6.3
  • NCCL 2.1.15
  • GCC 8.2.0

Experiments and Statistics

The statistic is the average training time, including data processing and model training time, and the training speed is measured with ips(instance per second). Note that we skip the first 50 iters as they may contain the device warmup time.

Here we compare PaddleVideo with the other video understanding toolkits in the same data and model settings.

To ensure the fairness of the comparison, the comparison experiments were conducted under the same hardware environment and using the same dataset. The dataset we used is generated by the data preparation, and in each model setting, the same data preprocessing methods are applied to make sure the same feature input.

Significant improvement can be observed when comparing with other video understanding framework as shown in the table below, Especially the Slowfast model is nearly 2x faster than the counterparts.

Results

Recognizers

Model batch size x gpus PaddleVideo(ips) Reference(ips) MMAction2 (ips) PySlowFast (ips)
TSM 16x8 58.1 46.04(temporal-shift-module) To do X
PPTSM 16x8 57.6 X X X
TSN 16x8 841.1 To do (tsn-pytorch) To do X
Slowfast 16x8 99.5 X To do 43.2
Attention_LSTM 128x8 112.6 X X X

Localizers

Model PaddleVideo(ips) MMAction2 (ips) BMN(boundary matching network) (ips)
BMN 43.84 x x

Segmenters

This repo provides performance and accuracy comparison between classical and popular sequential action segmentation models

Model Metrics Value Flops(M) Params(M) test time(ms) bs=1 test time(ms) bs=2 inference time(ms) bs=1 inference time(ms) bs=2
MS-TCN [email protected] 38.8% 791.360 0.8 170 - 10.68 -
ASRF [email protected] 55.7% 1,283.328 1.3 190 - 16.34 -
  • Model: model name, for example: PP-TSM
  • Metrics: Fill in the indicators used in the model test, and the data set used is breakfast
  • Value: Fill in the value corresponding to the metrics index, and generally keep two decimal places
  • Flops(M): The floating-point computation required for one forward operation of the model can be called paddlevideo/tools/summary.pyscript calculation (different models may need to be modified slightly), keep one decimal place, and measure it with data input tensor with shape of (1, 2048, 1000)
  • Params(M): The model parameter quantity, together with flops, will be calculated by the script, and one decimal place will be reserved
  • test time(ms) bs=1: When the python script starts the batchsize = 1 test, the time required for a sample is kept to two decimal places. The data set used in the test is breakfast.
  • test time(ms) bs=2: When the python script starts the batchsize = 2 test, the time required for a sample is kept to two decimal places. The sequential action segmentation model is generally a full convolution network, so the batch of training, testing and reasoning_ Size is 1. The data set used in the test is breakfast.
  • inference time(ms) bs=1: When the reasoning model is tested with GPU (default V100) with batchsize = 1, the time required for a sample is reserved to two decimal places. The dataset used for reasoning is breakfast.
  • inference time(ms) bs=2: When the reasoning model is tested with GPU (default V100) with batchsize = 1, the time required for a sample is reserved to two decimal places. The sequential action segmentation model is generally a full convolution network, so the batch of training, testing and reasoning_ Size is 1. The dataset used for reasoning is breakfast.