Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add video models + functions #814

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Add video models + functions #814

wants to merge 6 commits into from

Conversation

dreadatour
Copy link
Contributor

See #797

TODO:

Video models added

class VideoFile(File):
    """`DataModel` for reading video files."""


class VideoClip(VideoFile):
    """`DataModel` for reading video clips."""

    start_time: float
    end_time: float


class VideoFrame(VideoFile):
    """`DataModel` for reading video frames."""

    frame: int
    timestamp: float

Meta models added

class ImageMeta(DataModel):
    """`DataModel` for image file meta information."""

    width: int
    height: int
    format: str


class VideoMeta(DataModel):
    """`DataModel` for video file meta information."""

    width: int
    height: int
    fps: float
    duration: float
    frames_count: int
    codec: str


class VideoFrameMeta(DataModel):
    """`DataModel` for video frame image meta information."""

    frame: int
    timestamp: float
    width: int
    height: int
    format: str

Couple usage examples

Listing
from datachain import DataChain

ds = DataChain.from_storage("./src", type="video").save("videos")
ds.show(3)
$ python 01-index.py
                                                file                 file       file    file                   file      file                      file     file
                                              source                 path       size version                   etag is_latest             last_modified location
0  file:///Users/vlad/work/iterative/playground/v...  age_16_IMG_3341.MOV  280685520          0x1.9bc2127c00000p+30         1 2024-09-22 20:57:03+00:00     None
1  file:///Users/vlad/work/iterative/playground/v...         yura_big.mp4   32482027          0x1.9bc1de3400000p+30         1 2024-09-22 20:01:17+00:00     None
2  file:///Users/vlad/work/iterative/playground/v...         IMG_6648.mov  404354596          0x1.9bc220c800000p+30         1 2024-09-22 21:12:18+00:00     None

[Limited by 3 rows]
Add meta
from datachain import DataChain
from datachain.lib.video import video_meta

ds = DataChain.from_dataset("videos").map(meta=video_meta).save("videos-meta")
ds.show(3)
$ python 02-meta.py
                                                file                 file       file    file                   file      file                      file     file  meta   meta  \
                                              source                 path       size version                   etag is_latest             last_modified location width height
0  file:///Users/vlad/work/iterative/playground/v...  age_16_IMG_3341.MOV  280685520          0x1.9bc2127c00000p+30         1 2024-09-22 20:57:03+00:00     None  1080   1920
1  file:///Users/vlad/work/iterative/playground/v...         yura_big.mp4   32482027          0x1.9bc1de3400000p+30         1 2024-09-22 20:01:17+00:00     None   848    480
2  file:///Users/vlad/work/iterative/playground/v...         IMG_6648.mov  404354596          0x1.9bc220c800000p+30         1 2024-09-22 21:12:18+00:00     None  1080   1920

       meta        meta         meta  meta
        fps    duration frames_count codec
0  59.94006  124.613333         7472  hevc
1  60.00000  179.826667        10789  h264
2  60.00000  180.415000        10827  hevc

[Limited by 3 rows]
Split video to virtual frames
from typing import Iterator

from datachain import DataChain
from datachain.lib.file import VideoFile, VideoMeta, VideoFrame


def gen_frames(file: VideoFile, meta: VideoMeta) -> Iterator[tuple[VideoFrame, VideoMeta]]:
    for idx, img in enumerate(range(0, meta.frames_count, 100)):
        frame = idx * 100
        timestamp = frame / meta.fps
        video_frame = VideoFrame(**file.model_dump(), frame=frame, timestamp=timestamp)
        yield video_frame, meta


ds = (
    DataChain.from_dataset("videos-meta")
        .gen(gen_frames, output=("file", "meta"))
        .save("videos-frames-virtual")
)
ds.show(3)
$ python 03-frames-virtual.py
                                                file                 file       file    file                   file      file                      file     file  file      file  meta  \
                                              source                 path       size version                   etag is_latest             last_modified location frame timestamp width
0  file:///Users/vlad/work/iterative/playground/v...  age_16_IMG_3341.MOV  280685520          0x1.9bc2127c00000p+30         1 2024-09-22 20:57:03+00:00     None     0  0.000000  1080
1  file:///Users/vlad/work/iterative/playground/v...  age_16_IMG_3341.MOV  280685520          0x1.9bc2127c00000p+30         1 2024-09-22 20:57:03+00:00     None   100  1.668333  1080
2  file:///Users/vlad/work/iterative/playground/v...  age_16_IMG_3341.MOV  280685520          0x1.9bc2127c00000p+30         1 2024-09-22 20:57:03+00:00     None   200  3.336667  1080

    meta      meta        meta         meta  meta
  height       fps    duration frames_count codec
0   1920  59.94006  124.613333         7472  hevc
1   1920  59.94006  124.613333         7472  hevc
2   1920  59.94006  124.613333         7472  hevc

[Limited by 3 rows]
Split video into frames and upload to storage
from typing import Iterator

from datachain import DataChain
from datachain.catalog import get_catalog
from datachain.client import Client
from datachain.lib.file import VideoFile, VideoMeta, VideoFrameMeta, ImageFile
from datachain.lib.video import video_frames


def gen_frames(client: Client, file: VideoFile, meta: VideoMeta) -> Iterator[tuple[VideoFile, ImageFile, VideoFrameMeta]]:
    stem = file.get_file_stem()

    for idx, img in enumerate(video_frames(file, step=100)):
        frame = idx * 100
        filename = f"{stem}_{frame:06d}.jpg"
        f = client.upload(filename, img)
        timestamp = frame / meta.fps

        video_frame = ImageFile(**f.model_dump())
        image_meta = VideoFrameMeta(
            frame=frame,
            timestamp=timestamp,
            width=meta.width,
            height=meta.height,
            format="jpeg",
        )

        yield file, video_frame, image_meta


ds = (
    DataChain.from_dataset("videos-meta")
        .limit(1)
        .setup(client=lambda: get_catalog().get_client("gs://videos/frames"))
        .gen(gen_frames, output=("video", "frame", "meta"))
        .save("videos-frames-upload")
)
ds.show(3)
$ python 04-frames-upload.py
                                               video                video      video   video                  video     video                     video    video  \
                                              source                 path       size version                   etag is_latest             last_modified location
0  file:///Users/vlad/work/iterative/playground/v...  age_16_IMG_3341.MOV  280685520          0x1.9bc2127c00000p+30         1 2024-09-22 20:57:03+00:00     None
1  file:///Users/vlad/work/iterative/playground/v...  age_16_IMG_3341.MOV  280685520          0x1.9bc2127c00000p+30         1 2024-09-22 20:57:03+00:00     None
2  file:///Users/vlad/work/iterative/playground/v...  age_16_IMG_3341.MOV  280685520          0x1.9bc2127c00000p+30         1 2024-09-22 20:57:03+00:00     None

                frame                       frame   frame             frame             frame     frame                            frame    frame  meta      meta  meta  \
               source                        path    size           version              etag is_latest                    last_modified location frame timestamp width
0  gs://videos/frames  age_16_IMG_3341_000000.jpg  206936  1736786510082205  CJ3h7/eR84oDEAE=         1 2025-01-13 16:41:50.184000+00:00     None     0  0.000000  1080
1  gs://videos/frames  age_16_IMG_3341_000100.jpg  174064  1736786512007892  CNSl5fiR84oDEAE=         1 2025-01-13 16:41:52.118000+00:00     None   100  1.668333  1080
2  gs://videos/frames  age_16_IMG_3341_000200.jpg  149928  1736786513921389  CO2K2vmR84oDEAE=         1 2025-01-13 16:41:54.055000+00:00     None   200  3.336667  1080

    meta   meta
  height format
0   1920   jpeg
1   1920   jpeg
2   1920   jpeg

[Limited by 3 rows]

Copy link

codecov bot commented Jan 13, 2025

Codecov Report

Attention: Patch coverage is 26.60550% with 80 lines in your changes missing coverage. Please review.

Project coverage is 86.84%. Comparing base (3767173) to head (5892ab9).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
src/datachain/lib/video.py 0.00% 77 Missing ⚠️
src/datachain/lib/file.py 90.62% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #814      +/-   ##
==========================================
- Coverage   87.42%   86.84%   -0.59%     
==========================================
  Files         128      129       +1     
  Lines       11373    11479     +106     
  Branches     1537     1553      +16     
==========================================
+ Hits         9943     9969      +26     
- Misses       1049     1128      +79     
- Partials      381      382       +1     
Flag Coverage Δ
datachain 86.78% <26.60%> (-0.59%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@dmpetrov dmpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing PR!

It would be great to use concise and minimalistic naming and API because we are going to have many file types for multiple domains.

  1. Naming

Keywords like Meta will make it hard for user to remember and use the classes - user have their own meta 🙂

How about this renaming:
VideoFile -> BaseVideo (I assume people won't use this often)
VideoMeta -> Video (the most used class)
VideoClip -> Clip (also, shouldn't it be based on Video with meta?)
VideoFrame -> FrameBase
VideoFrameMeta -> Frame

start_time --> start
end_time --> end
frames_count --> count

Image -> BaseImage
ImageMeta -> Image

FileTypes can be also extended: image (read meta), base_image (do not read meta), video (read meta), base_video (do not read meta), video_clip, base_video_clip , ...

  1. Do we need dummy classes?

I assume that people prefer working with meta information while dealing with images and videos. A followup question - do we really need BaseImages and BaseVideo without any logic? Why don't we clean up API and keep only Meta-enrich version in the API? User still can work with videos as File if meta is not needed.

  1. Do we need singular methods?

save_video_clips() and save_video_clip() How much extra code user needs to get rid of singular form. If one method - let's avoid the singular version.

The same question for video_frames() and video_frames_np()

I assume, we can add the method and classes later if there is a need. But I'd not start with such rich API for now and try my best to keep in minimalistic.

WDYT?


width: int
height: int
format: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about EXIF and XMP? :)

yield img


def video_frames(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can a lot of these helpers become part of the Video* classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question 👍 I was thinking about this and tried to implement it this way, but in the end I've checked other types and files in lib module (images, hf) and make it the same way.

I was also thinking and trying to move all the models to the datachain.model module, but it turns out it needs more work and may be not backward compatible with File model. In is a subject for a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we need all of theses to become methods of Video class. Should it be a followup or in this PR?

I'd appreciate more insights on the issues with this approach.

Comment on lines 35 to 38
props = iio.improps(file.stream(), plugin="pyav")
frames_count, width, height, _ = props.shape

meta = iio.immeta(file.stream(), plugin="pyav")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this part, it looks like we are reading video file twice here. Need to check the other way to get video meta information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, also are we reading the whole file to get meta?

Copy link

cloudflare-workers-and-pages bot commented Jan 14, 2025

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 5892ab9
Status: ✅  Deploy successful!
Preview URL: https://beaeae60.datachain-documentation.pages.dev
Branch Preview URL: https://video-models.datachain-documentation.pages.dev

View logs

@dreadatour
Copy link
Contributor Author

  1. Naming

Keywords like Meta will make it hard for user to remember and use the classes - user have their own meta 🙂

👍

How about this renaming: VideoFile -> BaseVideo (I assume people won't use this often) VideoMeta -> Video (the most used class) VideoClip -> Clip (also, shouldn't it be based on Video with meta?) VideoFrame -> FrameBase VideoFrameMeta -> Frame

For now we have naming with File: TextFile, ImageFile and File itself. I left VideoFile for now, but rename others:

  • ImageMeta -> Image
  • VideoClipFile -> VideoClip (I can rename it to Clip as you suggested, just not sure yet, because see next line)
  • VideoFrameFile -> VideoFrame (I can rename it to Frame to be consistent with Clip, also Frame is already busy, see below)
  • VideoMeta -> Video
  • VideoFrameMeta -> Frame

start_time --> start end_time --> end frames_count --> count

Done. Only frames_count became frames, because I am not sure about count, too general, IMO.

Image -> BaseImage ImageMeta -> Image

We don't have Image model, we have ImageFile model, left it as is for now. ImageMeta -> Image done.

FileTypes can be also extended: image (read meta), base_image (do not read meta), video (read meta), base_video (do not read meta), video_clip, base_video_clip , ...

That's good suggestion, only we use FileTypes for now only in from_storage method. I am not sure we we want to change it to download files and read meta 🤔 Even with additional param.

  1. Do we need dummy classes?

I assume that people prefer working with meta information while dealing with images and videos. A followup question - do we really need BaseImages and BaseVideo without any logic? Why don't we clean up API and keep only Meta-enrich version in the API? User still can work with videos as File if meta is not needed.

Good question. I've added VideoFile only because we already have ImageFile, just to be consistent. Also it is useful when we use from_storage with type=video, and then we can use VideoFile type in mappers, like this:

def video_meta(file: "VideoFile") -> Video:
    """
    Returns video file meta information.

    Args:
        file (VideoFile): VideoFile object.

    Returns:
        Video: Video file meta information.
    """
  1. Do we need singular methods?

save_video_clips() and save_video_clip() How much extra code user needs to get rid of singular form. If one method - let's avoid the singular version.
The same question for video_frames() and video_frames_np()

Sounds reasonable to me 👍 Will update the code (not done yet).

  1. Default values

Done.

WDYT?

Those are great comments! Love the discussion ❤️

"""`DataModel` for reading video files."""


class VideoClip(VideoFile):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, how are these all modes connected with the helpers? how do I instantiate them? do I have to write my own UDFs to do that (just instantiate these classes?)


def save(self, destination: str):
"""Writes it's content to destination"""
self.read().save(destination)


class Image(DataModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this separate model?

timestamp: float = Field(default=0)


class Video(DataModel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be a subclass of VideoFile?

Copy link
Member

@dmpetrov dmpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvements.
A few followup questions about moving the methods to Video class and a plural-singular method.

class VideoClip(VideoFile):
"""`DataModel` for reading video clips."""

start: float = Field(default=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use some impossible value like -1.0

"""`DataModel` for reading video frames."""

frame: int = Field(default=0)
timestamp: float = Field(default=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 and -1.0 as defaults?

) from exc


def video_meta(file: "VideoFile") -> Video:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please avoid using erm meta? How about file_to_video(file: File)?
Btw... not just File as input type?

return iio.imread(file.stream(), index=frame, plugin="pyav")


def video_frame(file: "VideoFile", frame: int, format: str = "jpeg") -> bytes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually use jpg in the codebase, not jpeg

file: "VideoFile",
frame: int,
output_file: Union[str, pathlib.Path],
format: str = "jpeg",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jpg

def save_video_frame(
file: "VideoFile",
frame: int,
output_file: Union[str, pathlib.Path],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we really support Path?

yield img


def video_frames(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we need all of theses to become methods of Video class. Should it be a followup or in this PR?

I'd appreciate more insights on the issues with this approach.

start_frame: int = 0,
end_frame: Optional[int] = None,
step: int = 1,
format: str = "jpeg",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jpg

yield output_file


def save_video_clip(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it needs to be renamed to save_subvideo()
In the class names, we use term Clip for virtual videos (start-end) while in this case you are creating just another Video, not clip.

So, it needs to be renamed or we need to avoid this Clip-as-virtual-reference terminology.

output_file: Union[str, pathlib.Path],
codec: str = "libx264",
audio_codec: str = "aac",
) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to generalize the single and plural methods. We just need to come up with output format like output="{name}{:06d}.{ext}") and provide a string in case of a single file.

Also, this method will require generalization for writing to cloud like output={source}/tmp/{name}{:06d}.{ext}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Video file and Video clip, Video frame models and operations with them
3 participants