Expected FPS (or per-frame inference time) difference between video and streaming settings? #593

weirenorweiren · 2025-03-08T01:01:12Z

I want to apply SAM2 in real-time for tracking 50 objects ideally. The performance in terms of robustness is fantastic with even tiny model for my application, however, I noticed the inference time almost increases linearly with the number of objects. If the speed difference between video and streaming settings is expected to be small, then I might think of upgrading my hardwares and then tracking less objects for my proof-of-concept.

With that, I have the following questions:

Regarding the speed measurement in https://github.com/facebookresearch/sam2?tab=readme-ov-file#sam-21-checkpoints, how many objects did you track?
How much difference on FPS (or per-frame inference time) would it be expected between video and streaming settings?
For your speed measurement, are they done under the compilation mode? If not, what's the expected improvement on FPS (or per-frame inference time) with compilation on?

I appreciate any comments and thanks in advance!

weirenorweiren · 2025-03-09T02:44:26Z

For Q2, please see below for more clarifications.

I am referring to the difference on FPS between a video file input and a live streaming source. My lab camera has an upper limit of 20FPS and I'd love the model to go far beyond 20FPS. If I remembered correctly, I saw somewhere in the Issues there are tricks for video input like batch processing to speed it up. For my case, since it's per-frame feed, batch processing seems not applied. So I am wondering that without those tricks, how much speed would be compromised?

That reminds me to ask that for your speed measurement, did they involve tricks for video input? If not and you happen to test the per-frame inference time by feeding video data frame by frame, I, as long as others who are interested in real-time applications, would love to see since it's a really important reference for its potential on relevant tasks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expected FPS (or per-frame inference time) difference between video and streaming settings? #593

Expected FPS (or per-frame inference time) difference between video and streaming settings? #593

weirenorweiren commented Mar 8, 2025

weirenorweiren commented Mar 9, 2025

Expected FPS (or per-frame inference time) difference between video and streaming settings? #593

Expected FPS (or per-frame inference time) difference between video and streaming settings? #593

Comments

weirenorweiren commented Mar 8, 2025

weirenorweiren commented Mar 9, 2025