Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video support #221

Open
lonnylundsten opened this issue Apr 22, 2023 · 3 comments
Open

Video support #221

lonnylundsten opened this issue Apr 22, 2023 · 3 comments

Comments

@lonnylundsten
Copy link

lonnylundsten commented Apr 22, 2023

I have a couple of suggestions for additions to RectLabel:

  1. Add ability to adjust the frame captured from a video, i.e., instead of grabbing all frames allow the user to select the frames to grab: 1 frame per four seconds, etc.

  2. Ability to run inference on video clip and live video feed:
    https://developer.apple.com/documentation/vision/recognizing_objects_in_live_capture

  3. I'm not sure if you've seen this, but CoreMLPlayer has some really cool features: https://github.com/npna/CoreMLPlayer

I've asked the developer to allow for exporting to Yolo format, like RectLabel, and to incorporate a camera input. Adding the capability of his app to RectLabel would be really fantastic.

We've also developed a quicktime based video player that supports drawing localization boxes on the video: https://github.com/mbari-org/Sharktopoda/releases/tag/2.0.3

@ryouchinsa
Copy link
Owner

Thanks for writing the issue.

Thanks for introducing the CoreMLPlayer and we checked how it works.
We will implement your feature requests one by one.

  1. Frames per a second
  2. Running a Core ML model to a video clip and save the yolo txt files.
  3. Running a Core ML model to a live captured video and save the yolo txt files.

For the Sharktopoda, do you have a document how to use it?

Thank you.

スクリーンショット 2023-05-01 18 35 50

@lonnylundsten
Copy link
Author

Documentation is probably pretty sparse. From a users perspective, the instructions are here: https://docs.mbari.org/vars-annotation/setup/

The video player communicates with our annotation app, VARS, via UDP. The video player allows us to draw and display ML proposals on the video itself. VARS is reading/writing directly from a SQL Server database. The localizations are stored as a column in the db like: {"x": 1527, "y": 323, "width": 43, "height": 119, "generator": "yolov5-mbari315k" }. The class label would be in a different column.

@ryouchinsa
Copy link
Owner

I am sorry for late reply.

The new version 2023.12.08 was released.
Improved so that when converting video to image frames, you can set frames per second.

I will implement one by one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants