Lazily init filtergraph so it can respect raw decoded resolution #432

scotts · 2024-12-13T04:35:40Z

We've encountered runtime errors with some videos where the resolution in the stream metadata disagrees with the resolution of the raw decoded frame. Because we have been initializing filtergraph so early, we've had to rely on the stream metadata. This PR does the initialization laziily, so we can use the resolution of the raw decoded frame.

Note that we still rely on the stream metadata for the batch APIs - they can run into the same problem. Dealing with that case will be more involved, since we'll need to do a sacrificial decode to discover what the size of the tensors should be. We should create an issue to track that case.

Note that I did consider just obeying whatever the stream metadata was. However, for swscale, we're also using the raw decoded frame to determine what the width should be. We should maintain the same behavior no matter which method we're using for the color space conversion.

This change seems to be performance neutral. Running:

python benchmarks/decoders/benchmark_decoders.py --decoders decord,decord_batch,torchcodec_public,torchcodec_core,torchaudio --min-run-seconds 20

I get:

[- video=/home/scottas/github/torchcodec/benchmarks/decoders/../../test/resources/nasa_13013.mp4 h264 480x270, 13.013s 29.97002997002997fps -]
                           |  decode 10 uniform frames  |  decode 10 random frames  |  first 1 frames  |  first 10 frames  |  first 100 frames
1 threads: -----------------------------------------------------------------------------------------------------------------------------------
      DecordAccurate       |            54.6            |           121.3           |       13.0       |        18.5       |        65.8      
      TorchAudio           |           187.1            |           200.1           |       11.6       |        13.1       |        32.6      
      TorchCodecPublic     |            49.2            |            44.8           |       14.1       |        14.8       |        29.1      
      DecordAccurateBatch  |            52.6            |           118.1           |       13.2       |        16.2       |        48.3      
      TorchCodecCore       |            49.5            |            44.6           |       15.7       |        18.3       |        45.7      
                                                             
Times are in milliseconds (ms).

Which compares favorably with what I reported in #431.

src/torchcodec/decoders/_core/VideoDecoder.cpp

NicolasHug

Nice catch and great fix, thank you!

src/torchcodec/decoders/_core/VideoDecoder.cpp

NicolasHug · 2024-12-13T09:53:34Z

Note that we still rely on the stream metadata for the batch APIs - they can run into the same problem

Are you able to confirm that calling e.g. decoder.get_frames_at([3, 4]) on that same problematic video hits this exact check:

torchcodec/src/torchcodec/decoders/_core/VideoDecoder.cpp

Lines 905 to 916 in 85c1ac0

    
           if (preAllocatedOutputTensor.has_value()) { 
        
             auto shape = preAllocatedOutputTensor.value().sizes(); 
        
             TORCH_CHECK( 
        
                 (shape.size() == 3) && (shape[0] == expectedOutputHeight) && 
        
                     (shape[1] == expectedOutputWidth) && (shape[2] == 3), 
        
                 "Expected pre-allocated tensor of shape ", 
        
                 expectedOutputHeight, 
        
                 "x", 
        
                 expectedOutputWidth, 
        
                 "x3, got ", 
        
                 shape); 
        
           }

NicolasHug · 2024-12-13T10:02:49Z

Dealing with that case will be more involved, since we'll need to do a sacrificial decode to discover what the size of the tensors should be. We should create an issue to track that case.

I remember now that I had documented that potential workaround in code comments. I opened #433 to keep track of it.

scotts · 2024-12-13T11:57:41Z

Are you able to confirm that calling e.g. decoder.get_frames_at([3, 4]) on that same problematic video hits this exact check:

torchcodec/src/torchcodec/decoders/_core/VideoDecoder.cpp

Lines 905 to 916 in 85c1ac0

if (preAllocatedOutputTensor.has_value()) {

auto shape = preAllocatedOutputTensor.value().sizes();

TORCH_CHECK(

(shape.size() == 3) && (shape[0] == expectedOutputHeight) &&

(shape[1] == expectedOutputWidth) && (shape[2] == 3),

"Expected pre-allocated tensor of shape ",

expectedOutputHeight,

"x",

expectedOutputWidth,

"x3, got ",

shape);

}

Yup, confirmed.

src/torchcodec/decoders/_core/VideoDecoder.cpp

Lazily init filtergraph so it can respect raw decoded resolution

85c1ac0

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 13, 2024

scotts commented Dec 13, 2024

View reviewed changes

src/torchcodec/decoders/_core/VideoDecoder.cpp Show resolved Hide resolved

scotts requested a review from NicolasHug December 13, 2024 04:40

scotts marked this pull request as ready for review December 13, 2024 04:40

NicolasHug approved these changes Dec 13, 2024

View reviewed changes

src/torchcodec/decoders/_core/VideoDecoder.cpp Outdated Show resolved Hide resolved

src/torchcodec/decoders/_core/VideoDecoder.cpp Show resolved Hide resolved

NicolasHug reviewed Dec 13, 2024

View reviewed changes

src/torchcodec/decoders/_core/VideoDecoder.cpp Outdated Show resolved Hide resolved

Refactor based on review

c8070bc

scotts merged commit 943dc6e into pytorch:main Dec 13, 2024
36 of 37 checks passed

scotts deleted the bad_vid branch December 13, 2024 16:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazily init filtergraph so it can respect raw decoded resolution #432

Lazily init filtergraph so it can respect raw decoded resolution #432

scotts commented Dec 13, 2024 •

edited

Loading

NicolasHug left a comment

NicolasHug commented Dec 13, 2024

NicolasHug commented Dec 13, 2024

scotts commented Dec 13, 2024

Lazily init filtergraph so it can respect raw decoded resolution #432

Lazily init filtergraph so it can respect raw decoded resolution #432

Conversation

scotts commented Dec 13, 2024 • edited Loading

NicolasHug left a comment

Choose a reason for hiding this comment

NicolasHug commented Dec 13, 2024

NicolasHug commented Dec 13, 2024

scotts commented Dec 13, 2024

scotts commented Dec 13, 2024 •

edited

Loading