You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to apply SAM2 in real-time for tracking 50 objects ideally. The performance in terms of robustness is fantastic with even tiny model for my application, however, I noticed the inference time almost increases linearly with the number of objects. If the speed difference between video and streaming settings is expected to be small, then I might think of upgrading my hardwares and then tracking less objects for my proof-of-concept.
How much difference on FPS (or per-frame inference time) would it be expected between video and streaming settings?
For your speed measurement, are they done under the compilation mode? If not, what's the expected improvement on FPS (or per-frame inference time) with compilation on?
I appreciate any comments and thanks in advance!
The text was updated successfully, but these errors were encountered:
I am referring to the difference on FPS between a video file input and a live streaming source. My lab camera has an upper limit of 20FPS and I'd love the model to go far beyond 20FPS. If I remembered correctly, I saw somewhere in the Issues there are tricks for video input like batch processing to speed it up. For my case, since it's per-frame feed, batch processing seems not applied. So I am wondering that without those tricks, how much speed would be compromised?
That reminds me to ask that for your speed measurement, did they involve tricks for video input? If not and you happen to test the per-frame inference time by feeding video data frame by frame, I, as long as others who are interested in real-time applications, would love to see since it's a really important reference for its potential on relevant tasks!
I want to apply SAM2 in real-time for tracking 50 objects ideally. The performance in terms of robustness is fantastic with even tiny model for my application, however, I noticed the inference time almost increases linearly with the number of objects. If the speed difference between video and streaming settings is expected to be small, then I might think of upgrading my hardwares and then tracking less objects for my proof-of-concept.
With that, I have the following questions:
I appreciate any comments and thanks in advance!
The text was updated successfully, but these errors were encountered: