-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify requirements around ReferenceBlock elements and how BlockGroups signal random access points #419
Comments
ReferenceBlock is not mandatory in any case.
The ReferenceBlock means than the reference Block/SimpleBlock is needed to decode this BlockGroup, nothing more. That Block may itself depend on another Block, and so on.
Depending on the first question it may not be needed to a fake reference. That's if BlockGroup with a reference MUST mention their reference(s). If not a fake ReferenceBlock to itself might be a good idea (I think we already had this conversation but I don't remember the outcome).
I'm not familiar with "intra-refresh frames". Can you explain what it does ? A frame that has references how can it be a RAP ?
I think BlockGroup is usually not used for audio, so it kinda solves itself. But if we ever want to attach metadata to audio, we may need to use it as well. I think we could tell if tracks have all references properly mentioned or not. Alternatively, as it seems to be an Opus thing, in the Opus codec mapping we could mention that: although each frame references past frames, we do not need to fill the ReferenceBlock values, or even we MUST NOT to avoid having to deal with both cases. |
So it sounds like WebM and Matroska have diverged in this respect, which is unfortunate.
I should have said "periodic intra-refresh" instead of "rolling intra-refresh" (I've heard people use various names for this, but "periodic" seems to be the most popular). Intra-refresh is when there's a column of intra blocks in a frame. The other columns can be inter blocks. There's a restriction that motion vectors can't cross the intra column. On each frame you can change which column is the intra column. If you have N columns, then you can fully reconstruct the video if you decode N frames (where each frame has a different intra column). On its own, the frame is not a sync sample or an IDR frame. But by using an SEI recovery message you can determine how many frames you need to decode (and discard) in order resynchronize the decoder. The x264 Wikipedia page has a decent paragraph on this. |
Interesting. It sound like vertical interlacing with N "fields". We would need a system similar to interlacing to properly reference this. I'm not sure we really this, though.
In fact they don't for subtitles setting the block duration or for the last Block of a Segment to signal the duration. Looking at the WebM guidelines I don't see much about the In the Matroska specs there is nothing that makes More generally, in the track header, we may mention each of the BlockGroup feature that is used in the track. It may be tricky for muxers as they may not know ahead if the feature will actually be used or not. |
I spoke with Frank Galligan internally and they'll be updating the WebM guidelines document so that it mentions that no |
But For Matroska it also for all known audio/subtitle tracks. For video tracks it's trickier. It means adding a requirement that wasn't there. For example an MPEG codec with P frames may have been muxed with BlockGroup (especially in the early days of Matroska) without properly setting the @mbunkus what is your opinion on this ? As you know about muxing and deciding when to use BlockGroup or SimpleBlock. |
BTW, I'm not sure there are guarantees that all WebM muxers respect this rule. Maybe they do now, but was it always the case ? By design non-RAP frames using BlockGroup instead of SimpleBlock always set the proper reference timestamps ? In any case, I think it's important to solve this issue (Blocks where you don't know if they are RAP or not without Cues). For Matroska I am more leaning towards a flag that would define what assumption to use. With the default value being "ReferenceBlock" are not mandatory for non-RAP (ie the current situation). After that we can encourage people to use the new behaviour. In WebM the default value should be different. That's not nice but I'm not sure there is a cleaner way. If we find/decide that since Matroska v2 or v3 or v4 all non-RAP use ReferenceBlock in common muxers, then we could make the flag v2 (or v3 or v4) and use the same default as WebM. Older versions will not be able to make any assumptions. |
I'm not sure if this was always the case with WebM, but since WebM is focused on web-based streaming I don't think it's surprising for WebM to evolve over time. They've added a number of things to WebM since its first specification. The whole web community has generally moved to a "living standard" point of view. As for a new flag: I think that's reasonable. It would be really nice if Matroska had a way to signal RAPs for Blocks. I suppose this could be done with a new element, or perhaps by using two reserved bits in the Block header flags (one bit to signal whether or not the RAP flag is meaningful, and another bit for the RAP flag; or just use one bit since it's technically not problematic for a stream to pretend a RAP isn't a RAP, which is what all Blocks in existing Matroska files would signal). |
Indeed. That's a key difference with what we do here and how Matroska evolved over time. I noticed this week that YouTube is using fragmented MP4 for its AV1 content, rather than WebM. Just a few years ago it didn't even exist. But it's got the same name so people thinks it's the same thing. It can't be read by older parsers. So maybe there is also room for a fresh approach where we tolerate new features that are key but not readable by older parsers. See #422.
That's doable. Reserved bits are not parsed by older parsers so we can put anything in there without breaking compatibility. It's better than reliable on the presence/absence of elements already in use for many different scenarii. One problem with this is that to recover this information, you need to parse the Block headers instead of just the EBML level. In libebml we have a "partial data" read mode that allows reading deeper into some elements to achieve this, but I suspect that may not be the case for a lot parsers. On other hand for SimpleBlock we already store the keyframe flag in the header. So adding RAP information in there wouldn't add any more constraint. But I guess the issue is more for BlockGroup which doesn't have a keyframe flag at all. Another option would be to have an element in the Track that tells how ReferenceBlock presence/absence should be interpreted (or maybe the Segment). The twist is that WebM and Matroska would be using a different ID for this element, so they can have different default values (WebM: missing ReferenceBlock means RAP, Matroska: missing ReferenceBlock doesn't define anything about RAP). I think it's cleaner for readers of both formats. Or maybe it's overkill. After all some elements (Display dimensions) can only be interpreted when you know others. It doesn't seem odd that depending on the value of the Doctype some elements might be interpreted differently. The default value would even be different depending on the Doctype. I know in libebml/libmatroska that's not possible yet, especially as these values are hardcoded. |
I prepared a section on Random Access Points after the (Simple)Block sections. It is describing the situation as found in WebM:
I did not add a flag in the (Simple)Block or in the Track to explain how to interpret the RAP status of Blocks. That could come later. For "Periodic Intra-Refresh" I don't really have a solution to signal it properly. If it's periodic maybe there's a start ? If so, the first frame of the "partial" intra frame could be the one marked as keyframe. If there's no start and it keeps "rolling" maybe a start could be picked arbitrarily. Most likely the first column, in the case of x264. |
RAPs, rolling intraAt the moment Back in the days of simpler codecs As for rolling intra, Audio & key frames
That's not entirely correct. There's at least one popular & one ancient codec that differentiates between "key frame" and other frames: TrueHD & certain types of RealAudio if I remember correctly. For TrueHD (and MLP) there are two types of blocks, one starts with a "major sync info" structure, the other doesn't. Decoding can only start on a "major sync" block, making this effectively a "key frame", if you will. To me it wouldn't make sense to say that audio must always have the "key" flag set in Audio & pre-rollAs Steve mentioned, we have a track header element for that. It's up to the player to use it properly. |
This is equivalent to a Random Access Point as defined by the W3C https://w3c.github.io/media-source/index.html#random-access-point Except it only applies to a single track. FFmpeg sets a dummy valie (not proper reference) when a Block is not a keyframe https://github.com/FFmpeg/FFmpeg/blob/master/libavformat/matroskaenc.c#L2112 libwebm also sets a dummy value https://aomedia.googlesource.com/aom/+/refs/heads/master/third_party/libwebm/mkvmuxer/mkvmuxer.cc#2584 https://aomedia.googlesource.com/aom/+/refs/heads/master/third_party/libwebm/mkvmuxer/mkvmuxer.cc#3584 then writes it https://aomedia.googlesource.com/aom/+/refs/heads/master/third_party/libwebm/mkvmuxer/mkvmuxerutil.cc#72 https://aomedia.googlesource.com/aom/+/refs/heads/master/third_party/libwebm/mkvmuxer/mkvmuxerutil.cc#139 mkvtoolnix is more refined and allows a past and post reference timestamp https://gitlab.com/mbunkus/mkvtoolnix/-/blob/main/src/merge/libmatroska_extensions.cpp#L53 But according to Moritz it's not that clean either #419 (comment) Fixes #419
This is line with #419 and the PR #554 * audio & subtitles are considered RAP by default (noone cared to set the proper flags in SimpleBlock) * video are RAP if the keyframe flag is set or the BlockGroup has no ReferenceBlock. * all video reference needed for a frame must be listed in a ReferenceBlock (if using a BlockGroup) * intro-only (AV1) frames use a ReferenceBlock value of 0
This is equivalent to a Random Access Point as defined by the W3C https://w3c.github.io/media-source/index.html#random-access-point Except it only applies to a single track. FFmpeg sets a dummy valie (not proper reference) when a Block is not a keyframe https://github.com/FFmpeg/FFmpeg/blob/master/libavformat/matroskaenc.c#L2112 libwebm also sets a dummy value https://aomedia.googlesource.com/aom/+/refs/heads/master/third_party/libwebm/mkvmuxer/mkvmuxer.cc#2584 https://aomedia.googlesource.com/aom/+/refs/heads/master/third_party/libwebm/mkvmuxer/mkvmuxer.cc#3584 then writes it https://aomedia.googlesource.com/aom/+/refs/heads/master/third_party/libwebm/mkvmuxer/mkvmuxerutil.cc#72 https://aomedia.googlesource.com/aom/+/refs/heads/master/third_party/libwebm/mkvmuxer/mkvmuxerutil.cc#139 mkvtoolnix is more refined and allows a past and post reference timestamp https://gitlab.com/mbunkus/mkvtoolnix/-/blob/main/src/merge/libmatroska_extensions.cpp#L53 But according to Moritz it's not that clean either #419 (comment) Fixes #419
This is line with #419 and the PR #554 * audio & subtitles are considered RAP by default (noone cared to set the proper flags in SimpleBlock) * video are RAP if the keyframe flag is set or the BlockGroup has no ReferenceBlock. * all video reference needed for a frame must be listed in a ReferenceBlock (if using a BlockGroup) * intro-only (AV1) frames use a ReferenceBlock value of 0
This is line with #419 and the PR #554 * audio & subtitles are considered RAP by default (noone cared to set the proper flags in SimpleBlock) * video are RAP if the keyframe flag is set or the BlockGroup has no ReferenceBlock. * all video reference needed for a frame must be listed in a ReferenceBlock (if using a BlockGroup) * intro-only (AV1) frames use a ReferenceBlock value of 0
This is line with #419 and the PR #554 * audio & subtitles are considered RAP by default (noone cared to set the proper flags in SimpleBlock) * video are RAP if the keyframe flag is set or the BlockGroup has no ReferenceBlock. * all video reference needed for a frame must be listed in a ReferenceBlock (if using a BlockGroup) * intro-only (AV1) frames use a ReferenceBlock value of 0
This is line with #419 and the PR #554 * audio & subtitles are considered RAP by default (noone cared to set the proper flags in SimpleBlock) * video are RAP if the keyframe flag is set or the BlockGroup has no ReferenceBlock. * all video reference needed for a frame must be listed in a ReferenceBlock (if using a BlockGroup) * intro-only (AV1) frames use a ReferenceBlock value of 0
This is line with #419 and the PR #554 * audio & subtitles are considered RAP by default (noone cared to set the proper flags in SimpleBlock) * video are RAP if the keyframe flag is set or the BlockGroup has no ReferenceBlock. * all video reference needed for a frame must be listed in a ReferenceBlock (if using a BlockGroup) * intro-only (AV1) frames use a ReferenceBlock value of 0
This is line with #419 and the PR #554 * audio & subtitles are considered RAP by default (noone cared to set the proper flags in SimpleBlock) * video are RAP if the keyframe flag is set or the BlockGroup has no ReferenceBlock. * all video reference needed for a frame must be listed in a ReferenceBlock (if using a BlockGroup) * intro-only (AV1) frames use a ReferenceBlock value of 0
This is line with #419 and the PR #554 * audio & subtitles are considered RAP by default (noone cared to set the proper flags in SimpleBlock) * video are RAP if the keyframe flag is set or the BlockGroup has no ReferenceBlock. * all video reference needed for a frame must be listed in a ReferenceBlock (if using a BlockGroup) * intro-only (AV1) frames use a ReferenceBlock value of 0
The W3C WebM spec states:
It's possible I've overlooked this in the MKV specs, but I believe the following is missing:
The text was updated successfully, but these errors were encountered: