Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazily init filtergraph so it can respect raw decoded resolution #432

Merged
merged 2 commits into from
Dec 13, 2024

Conversation

scotts
Copy link
Contributor

@scotts scotts commented Dec 13, 2024

We've encountered runtime errors with some videos where the resolution in the stream metadata disagrees with the resolution of the raw decoded frame. Because we have been initializing filtergraph so early, we've had to rely on the stream metadata. This PR does the initialization laziily, so we can use the resolution of the raw decoded frame.

Note that we still rely on the stream metadata for the batch APIs - they can run into the same problem. Dealing with that case will be more involved, since we'll need to do a sacrificial decode to discover what the size of the tensors should be. We should create an issue to track that case.

Note that I did consider just obeying whatever the stream metadata was. However, for swscale, we're also using the raw decoded frame to determine what the width should be. We should maintain the same behavior no matter which method we're using for the color space conversion.

This change seems to be performance neutral. Running:

python benchmarks/decoders/benchmark_decoders.py --decoders decord,decord_batch,torchcodec_public,torchcodec_core,torchaudio --min-run-seconds 20

I get:

[- video=/home/scottas/github/torchcodec/benchmarks/decoders/../../test/resources/nasa_13013.mp4 h264 480x270, 13.013s 29.97002997002997fps -]
                           |  decode 10 uniform frames  |  decode 10 random frames  |  first 1 frames  |  first 10 frames  |  first 100 frames
1 threads: -----------------------------------------------------------------------------------------------------------------------------------
      DecordAccurate       |            54.6            |           121.3           |       13.0       |        18.5       |        65.8      
      TorchAudio           |           187.1            |           200.1           |       11.6       |        13.1       |        32.6      
      TorchCodecPublic     |            49.2            |            44.8           |       14.1       |        14.8       |        29.1      
      DecordAccurateBatch  |            52.6            |           118.1           |       13.2       |        16.2       |        48.3      
      TorchCodecCore       |            49.5            |            44.6           |       15.7       |        18.3       |        45.7      
                                                             
Times are in milliseconds (ms).  

Which compares favorably with what I reported in #431.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 13, 2024
@scotts scotts requested a review from NicolasHug December 13, 2024 04:40
@scotts scotts marked this pull request as ready for review December 13, 2024 04:40
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch and great fix, thank you!

src/torchcodec/decoders/_core/VideoDecoder.cpp Outdated Show resolved Hide resolved
@NicolasHug
Copy link
Member

Note that we still rely on the stream metadata for the batch APIs - they can run into the same problem

Are you able to confirm that calling e.g. decoder.get_frames_at([3, 4]) on that same problematic video hits this exact check:

if (preAllocatedOutputTensor.has_value()) {
auto shape = preAllocatedOutputTensor.value().sizes();
TORCH_CHECK(
(shape.size() == 3) && (shape[0] == expectedOutputHeight) &&
(shape[1] == expectedOutputWidth) && (shape[2] == 3),
"Expected pre-allocated tensor of shape ",
expectedOutputHeight,
"x",
expectedOutputWidth,
"x3, got ",
shape);
}

@NicolasHug
Copy link
Member

Dealing with that case will be more involved, since we'll need to do a sacrificial decode to discover what the size of the tensors should be. We should create an issue to track that case.

I remember now that I had documented that potential workaround in code comments. I opened #433 to keep track of it.

@scotts
Copy link
Contributor Author

scotts commented Dec 13, 2024

Are you able to confirm that calling e.g. decoder.get_frames_at([3, 4]) on that same problematic video hits this exact check:

if (preAllocatedOutputTensor.has_value()) {
auto shape = preAllocatedOutputTensor.value().sizes();
TORCH_CHECK(
(shape.size() == 3) && (shape[0] == expectedOutputHeight) &&
(shape[1] == expectedOutputWidth) && (shape[2] == 3),
"Expected pre-allocated tensor of shape ",
expectedOutputHeight,
"x",
expectedOutputWidth,
"x3, got ",
shape);
}

Yup, confirmed.

@scotts scotts merged commit 943dc6e into pytorch:main Dec 13, 2024
36 of 37 checks passed
@scotts scotts deleted the bad_vid branch December 13, 2024 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants