-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Video models #890
Video models #890
Conversation
for more information, see https://pre-commit.ci
* [pre-commit.ci] pre-commit autoupdate updates: - [github.com/astral-sh/ruff-pre-commit: v0.8.6 → v0.9.1](astral-sh/ruff-pre-commit@v0.8.6...v0.9.1) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Bumps [ultralytics](https://github.com/ultralytics/ultralytics) from 8.3.58 to 8.3.61. - [Release notes](https://github.com/ultralytics/ultralytics/releases) - [Commits](ultralytics/ultralytics@v8.3.58...v8.3.61) --- updated-dependencies: - dependency-name: ultralytics dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Review help/usage for cli commands The pattern followed is: - Descriptions: Complete sentences with periods - Help messages: Concise phrases without periods - Consistent terminology ("Iterative Studio") - Clear, standardized format for similar arguments * Bring uniformity for Studio mention * Override default command failure * Remove datasets from studio * Fix anon message and remove edatachain message * dirs to directories * Remove studio dataset test
* prefetching: remove prefetched item after use in udf This PR removes the prefetched item after use in the UDF. This is enabled by default on `prefetch>0`, unless `cache=True` is set in the UDF, in which case the prefetched item is not removed. For pytorch dataloader, this is not enabled by default, but can be enabled by setting `remove_prefetched=True` in the `PytorchDataset` class. This is done so because the dataset can be used in multiple epochs, and removing the prefetched item after use can cause it to redownload again in the next epoch. The exposed `remove_prefetched=True|False` setting could be renamed to some better option. Feedbacks are welcome. * close iterable properly
* Rename studio to auth for cli command https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?disco=AAABcAvsrEQ * Drop aws_endpoint_url option https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?disco=AAABcAvsrEU * Fix sources argument for cp https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?disco=AAABcAvsrEY https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?disco=AAABcAvsrEc * Fix anonymous arg help https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?disco=AAABcAvsrEs https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?disco=AAABcAvsrEo * Update cached list of files for the sources https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?disco=AAABcAvsrEw * Reorder verbose and quiet https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?disco=AAABcAvsrE0 * Path to a directory or file to put data to https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?disco=AAABcAvsrEg
* added main logic for outer join * fixing filters * removign datasetquery tests and added more datachain unit tests
If usearch fails to download the extension, it will keep retrying in the future. This adds significant cost - for example, in `tests/func/test_pytorch.py` run, it was invoked 111 times, taking ~30 seconds in total. Now, we cache the return value for the whole session.
* move tests using cloud_test_catalog into func directory * move tests using tmpfile catalog * move long running tests that read/write from disk
updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.1 → v0.9.2](astral-sh/ruff-pre-commit@v0.9.1...v0.9.2) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Bumps [ultralytics](https://github.com/ultralytics/ultralytics) from 8.3.61 to 8.3.64. - [Release notes](https://github.com/ultralytics/ultralytics/releases) - [Commits](ultralytics/ultralytics@v8.3.61...v8.3.64) --- updated-dependencies: - dependency-name: ultralytics dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.22 to 9.5.50. - [Release notes](https://github.com/squidfunk/mkdocs-material/releases) - [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG) - [Commits](squidfunk/mkdocs-material@9.5.22...9.5.50) --- updated-dependencies: - dependency-name: mkdocs-material dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #890 +/- ##
========================================
Coverage 87.72% 87.72%
========================================
Files 129 130 +1
Lines 11492 11641 +149
Branches 1554 1579 +25
========================================
+ Hits 10081 10212 +131
- Misses 1022 1033 +11
- Partials 389 396 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
||
return video_info(self) | ||
|
||
def get_frame(self, frame: int) -> "VideoFrame": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but should these be to_
methods to match the DataChain class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but should these be
to_
methods to match the DataChain class?
Looks reasonable 🤔 Although it is not a direct conversion ("to"), but rather getting a part of the file into another file, like "get frame from video" looks good to me, but "video to frame" looks odd. What do you think? I don't have strict opinion on this 🤔
Deploying datachain-documentation with
|
Latest commit: |
94d363c
|
Status: | ✅ Deploy successful! |
Preview URL: | https://a1a29e21.datachain-documentation.pages.dev |
Branch Preview URL: | https://video-models-2.datachain-documentation.pages.dev |
|
||
|
||
@pytest.fixture(autouse=True) | ||
def video_file(catalog) -> VideoFile: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[C] Some of these are probably func tests as they are writing/reading to/from disk.
|
||
return video_frame_bytes(self, format) | ||
|
||
def save(self, output: str, format: str = "jpg") -> "ImageFile": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[C] Now that we have virtual models, we could add a video example in this repo.
if len(video_streams) == 0: | ||
raise FileError(file, "no video streams found in video file") | ||
|
||
video_stream = video_streams[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Q] Why take the first one? Is there only ever one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some video container formats (such as MPEG-4) support multiple video, audio, and subtitle streams. However, in practice, video files with multiple video streams are extremely rare. Most applications, including video players and streaming platforms, are not designed to handle multiple video streams because there is little practical use for them.
I have heard of a few specialized use cases, such as movies with different aspect ratios or sports event recordings featuring multiple camera angles. However, these are rare exceptions, and I have never encountered them in real-world scenarios. In most cases, it is much simpler to provide multiple separate video files and allow users to download or stream only the one they need.
Alternative approach to implement video models based on this comment. Looks much cleaner.
New
VideoFile
modelNew
VideoFrame
modelOne can create
VideoFrame
without downloading video file, since it is "virtual" frame: original VideoFile + frame number.If physical frame image is needed, call
save
method, which uploads frame image into storage and returnsImageFile
new model.API:
New
VideoFragment
modelOne can create
VideoFragment
without downloading video file, since it is "virtual" fragment: original video file + start/end timestamp.If physical fragment video is needed, call
save
method, which uploads fragment video into storage and returns newVideoFile
model.API:
New
Video
modelVideo file meta information.