Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify requirements around ReferenceBlock elements and how BlockGroups signal random access points #419

Closed
mjbshaw opened this issue Aug 29, 2020 · 10 comments · Fixed by #554
Labels
clarifications spec_main Main Matroska spec document target

Comments

@mjbshaw
Copy link
Contributor

mjbshaw commented Aug 29, 2020

The W3C WebM spec states:

Either a SimpleBlock element with its Keyframe flag set, or a BlockGroup element having no ReferenceBlock elements, signals the location of a random access point for that track. The order of multiplexed blocks within a media segment MUST conform to the WebM Muxer Guidelines.

It's possible I've overlooked this in the MKV specs, but I believe the following is missing:

  • Are ReferenceBlock elements mandatory for non-RAPs? Are BlockGroup elements required to specify all references to other blocks? Or just some? Or none?
  • Does MKV agree with WebM in that a lack of ReferenceBlock elements signals that a BlockGroup is a RAP?
  • If so, how should a non-RAP I-frame be muxed? Would you need to add a fake ReferenceBlock element to signal it's not a RAP (e.g., a ReferenceBlock referring to itself)?
  • How should rolling intra-refresh frames be muxed in a BlockGroup? These can be treated as RAPs (assuming you drop enough frames), but they can also have dependencies on other frames. If ReferenceBlock is used to report all its dependencies then how can the player know it's a RAP?
  • Similar to the above, what is the guidance for audio, e.g. Opus? Any block in an audio stream is effectively a RAP, though there's the seek pre-roll you need to take into account. I don't think it really makes sense for an audio block to use ReferenceBlock elements, even if the block technically builds upon prior blocks. Can we specify which types of tracks should/shouldn't use ReferenceBlock elements?
@robUx4 robUx4 added clarifications spec_main Main Matroska spec document target labels Aug 30, 2020
@robUx4
Copy link
Contributor

robUx4 commented Aug 30, 2020

* Are ReferenceBlock elements mandatory for non-RAPs? Are BlockGroup elements required to specify _all_ references to other blocks? Or just some? Or none?

ReferenceBlock is not mandatory in any case.
BlockGroups are not required to reference any other Block/SimpleBlock. I suppose they usually do. An analysis of existing files might give a better idea of the common practice. And if they usually do that's probably a rule we want to add.

* Does MKV agree with WebM in that a lack of ReferenceBlock elements signals that a BlockGroup is a RAP?

The ReferenceBlock means than the reference Block/SimpleBlock is needed to decode this BlockGroup, nothing more. That Block may itself depend on another Block, and so on.

* If so, how should a non-RAP I-frame be muxed? Would you need to add a fake ReferenceBlock element to signal it's not a RAP (e.g., a ReferenceBlock referring to itself)?

Depending on the first question it may not be needed to a fake reference. That's if BlockGroup with a reference MUST mention their reference(s). If not a fake ReferenceBlock to itself might be a good idea (I think we already had this conversation but I don't remember the outcome).
Looking at the AV1 specs, it says A Block with frame_header_obu where the frame_type is INTRA_ONLY_FRAME MUST use a ReferenceBlock with a value of 0 to reference itself.. In any case this solution is allowed.

* How should rolling intra-refresh frames be muxed in a BlockGroup? These can be treated as RAPs (assuming you drop enough frames), but they can also have dependencies on other frames. If ReferenceBlock is used to report all its dependencies then how can the player know it's a RAP?

I'm not familiar with "intra-refresh frames". Can you explain what it does ? A frame that has references how can it be a RAP ?

* Similar to the above, what is the guidance for audio, e.g. Opus? Any block in an audio stream is effectively a RAP, though there's the seek pre-roll you need to take into account. I don't think it really makes sense for an audio block to use ReferenceBlock elements, even if the block technically builds upon prior blocks. Can we specify which types of tracks should/shouldn't use ReferenceBlock elements?

I think BlockGroup is usually not used for audio, so it kinda solves itself. But if we ever want to attach metadata to audio, we may need to use it as well. I think we could tell if tracks have all references properly mentioned or not. Alternatively, as it seems to be an Opus thing, in the Opus codec mapping we could mention that: although each frame references past frames, we do not need to fill the ReferenceBlock values, or even we MUST NOT to avoid having to deal with both cases.

@mjbshaw
Copy link
Contributor Author

mjbshaw commented Aug 31, 2020

So it sounds like WebM and Matroska have diverged in this respect, which is unfortunate.

I'm not familiar with "intra-refresh frames". Can you explain what it does ? A frame that has references how can it be a RAP ?

I should have said "periodic intra-refresh" instead of "rolling intra-refresh" (I've heard people use various names for this, but "periodic" seems to be the most popular).

Intra-refresh is when there's a column of intra blocks in a frame. The other columns can be inter blocks. There's a restriction that motion vectors can't cross the intra column. On each frame you can change which column is the intra column. If you have N columns, then you can fully reconstruct the video if you decode N frames (where each frame has a different intra column).

On its own, the frame is not a sync sample or an IDR frame. But by using an SEI recovery message you can determine how many frames you need to decode (and discard) in order resynchronize the decoder. The x264 Wikipedia page has a decent paragraph on this.

@robUx4
Copy link
Contributor

robUx4 commented Sep 6, 2020

Interesting. It sound like vertical interlacing with N "fields". We would need a system similar to interlacing to properly reference this. I'm not sure we really this, though.

BlockGroups are not required to reference any other Block/SimpleBlock. I suppose they usually do.

In fact they don't for subtitles setting the block duration or for the last Block of a Segment to signal the duration.

Looking at the WebM guidelines I don't see much about the lack of ReferenceBlock elements signals that a BlockGroup is a RAP. Is this maybe a codec thing ? For example for VP9 ?
So it may differ depending on the codec. Which is not right as the point of signaling ReferenceBlock at the container level is that you can seek properly of keep the relevant Blocks when cutting a file.

In the Matroska specs there is nothing that makes ReferenceBlock mandatory, and thus the non-presence have a meaning (RAP). Since BlockGroup can be used for many things, it may not be wise to have this rule hardcoded for everything. A flag telling the demuxer that ReferenceBlock is used for this track could help. It may have different values: not used, used for all existing references, partially used (some blocks omit some of their references), loosely (some blocks have none of their references mentioned). A default value that covers how BlockGroup is currently used would then be applied. For examples no subtitle tracks has ReferenceBlock but since they never reference any block, any of the values described above could apply.

More generally, in the track header, we may mention each of the BlockGroup feature that is used in the track.

It may be tricky for muxers as they may not know ahead if the feature will actually be used or not.

@mjbshaw
Copy link
Contributor Author

mjbshaw commented Sep 6, 2020

I spoke with Frank Galligan internally and they'll be updating the WebM guidelines document so that it mentions that no ReferenceBlock elements means a block is a RAP. This effectively makes ReferenceBlock mandatory for non-RAP frames in WebM. WebM only supports VP8 and VP9 for video, and both of those codecs have the property that I-frames are also RAPs (which isn't necessarily true for codecs like H.264).

@robUx4
Copy link
Contributor

robUx4 commented Sep 13, 2020

But BlockGroup can also be used by other kind of tracks. For example IIRC DiscardPadding was added for Opus which is part of WebM. BlockDuration is also used for subtitle tracks like "WebVTT". Luckily I think all frames are RAP so, if ReferenceBlock is mandatory for non-RAP, a missing ReferenceBlock means it's a RAP. And that's also the case for these codecs. It works for WebM.

For Matroska it also for all known audio/subtitle tracks. For video tracks it's trickier. It means adding a requirement that wasn't there. For example an MPEG codec with P frames may have been muxed with BlockGroup (especially in the early days of Matroska) without properly setting the ReferenceBlock (one of the reason for SimpleBlock was that finding this information accurately was tricky). I fear remuxing such old files with these rules will assume some frames are RAP when in fact they are not.

@mbunkus what is your opinion on this ? As you know about muxing and deciding when to use BlockGroup or SimpleBlock.

@robUx4
Copy link
Contributor

robUx4 commented Sep 13, 2020

BTW, I'm not sure there are guarantees that all WebM muxers respect this rule. Maybe they do now, but was it always the case ? By design non-RAP frames using BlockGroup instead of SimpleBlock always set the proper reference timestamps ?

In any case, I think it's important to solve this issue (Blocks where you don't know if they are RAP or not without Cues). For Matroska I am more leaning towards a flag that would define what assumption to use. With the default value being "ReferenceBlock" are not mandatory for non-RAP (ie the current situation). After that we can encourage people to use the new behaviour. In WebM the default value should be different. That's not nice but I'm not sure there is a cleaner way.

If we find/decide that since Matroska v2 or v3 or v4 all non-RAP use ReferenceBlock in common muxers, then we could make the flag v2 (or v3 or v4) and use the same default as WebM. Older versions will not be able to make any assumptions.

@mjbshaw
Copy link
Contributor Author

mjbshaw commented Sep 28, 2020

I'm not sure if this was always the case with WebM, but since WebM is focused on web-based streaming I don't think it's surprising for WebM to evolve over time. They've added a number of things to WebM since its first specification. The whole web community has generally moved to a "living standard" point of view.

As for a new flag: I think that's reasonable. It would be really nice if Matroska had a way to signal RAPs for Blocks. I suppose this could be done with a new element, or perhaps by using two reserved bits in the Block header flags (one bit to signal whether or not the RAP flag is meaningful, and another bit for the RAP flag; or just use one bit since it's technically not problematic for a stream to pretend a RAP isn't a RAP, which is what all Blocks in existing Matroska files would signal).

@robUx4
Copy link
Contributor

robUx4 commented Oct 4, 2020

I'm not sure if this was always the case with WebM, but since WebM is focused on web-based streaming I don't think it's surprising for WebM to evolve over time. They've added a number of things to WebM since its first specification. The whole web community has generally moved to a "living standard" point of view.

Indeed. That's a key difference with what we do here and how Matroska evolved over time. I noticed this week that YouTube is using fragmented MP4 for its AV1 content, rather than WebM. Just a few years ago it didn't even exist. But it's got the same name so people thinks it's the same thing. It can't be read by older parsers. So maybe there is also room for a fresh approach where we tolerate new features that are key but not readable by older parsers. See #422.

As for a new flag: I think that's reasonable. It would be really nice if Matroska had a way to signal RAPs for Blocks. I suppose this could be done with a new element, or perhaps by using two reserved bits in the Block header flags (one bit to signal whether or not the RAP flag is meaningful, and another bit for the RAP flag; or just use one bit since it's technically not problematic for a stream to pretend a RAP isn't a RAP, which is what all Blocks in existing Matroska files would signal).

That's doable. Reserved bits are not parsed by older parsers so we can put anything in there without breaking compatibility. It's better than reliable on the presence/absence of elements already in use for many different scenarii. One problem with this is that to recover this information, you need to parse the Block headers instead of just the EBML level. In libebml we have a "partial data" read mode that allows reading deeper into some elements to achieve this, but I suspect that may not be the case for a lot parsers.

On other hand for SimpleBlock we already store the keyframe flag in the header. So adding RAP information in there wouldn't add any more constraint. But I guess the issue is more for BlockGroup which doesn't have a keyframe flag at all.

Another option would be to have an element in the Track that tells how ReferenceBlock presence/absence should be interpreted (or maybe the Segment). The twist is that WebM and Matroska would be using a different ID for this element, so they can have different default values (WebM: missing ReferenceBlock means RAP, Matroska: missing ReferenceBlock doesn't define anything about RAP). I think it's cleaner for readers of both formats. Or maybe it's overkill. After all some elements (Display dimensions) can only be interpreted when you know others. It doesn't seem odd that depending on the value of the Doctype some elements might be interpreted differently. The default value would even be different depending on the Doctype. I know in libebml/libmatroska that's not possible yet, especially as these values are hardcoded.

@robUx4
Copy link
Contributor

robUx4 commented Apr 23, 2021

I prepared a section on Random Access Points after the (Simple)Block sections.
354f946

It is describing the situation as found in WebM:

  • keyframe <=> RAP
  • no ReferenceBlock <=> RAP
    There is some room left for audio and subtitle blocks that may not have set the values properly so far.
    All referenced frames should be mentioned for each Block, although I'm not sure this is the case in existing files.

I did not add a flag in the (Simple)Block or in the Track to explain how to interpret the RAP status of Blocks. That could come later.

For "Periodic Intra-Refresh" I don't really have a solution to signal it properly. If it's periodic maybe there's a start ? If so, the first frame of the "partial" intra frame could be the one marked as keyframe. If there's no start and it keeps "rolling" maybe a start could be picked arbitrarily. Most likely the first column, in the case of x264.
So in both case we'd have start, which is all the container needs to seek. The codec description should say how to map such cases to Matroska fields (like the intra-only in AV1 says a ReferenceBlock with a timestamp of 0 should be used).

@mbunkus
Copy link
Contributor

mbunkus commented May 30, 2021

RAPs, rolling intra

At the moment mkvmerge only uses RAPs for BlockGroups without ReferenceBlock elements (or SimpleBlock with key frame flag set). All other frames in BlockGroups have at least one ReferenceBlock.

Back in the days of simpler codecs mkvmerge tried to use values for ReferenceBlock that actually refer to the frame the current frame depends on. This is definitely no longer the case. It uses… well… not arbitrary values, but something like "preceding I or P frame" as the mathematical reference even if that's not the actually referenced frame.

As for rolling intra, mkvmerge doesn't really support them, unfortunately, for several reasons. It has an option for traing non-RAP I frames in AVC/H.264 as if they were RAPs as a workaround for files that consist solely of rolling intra. I have no good idea how to express them in current Matroska/WebM structures properly.

Audio & key frames

Any block in an audio stream is effectively a RAP

That's not entirely correct. There's at least one popular & one ancient codec that differentiates between "key frame" and other frames: TrueHD & certain types of RealAudio if I remember correctly.

For TrueHD (and MLP) there are two types of blocks, one starts with a "major sync info" structure, the other doesn't. Decoding can only start on a "major sync" block, making this effectively a "key frame", if you will. mkvmerge will use SimpleBlock elements with the "key" flag set depending on whether it contains a "major sync" block or not. Same for cues; only frames with a "major sync" are candidates for being listed in the cues. One file I have just looked at has a ratio of roughly 1 key to 110 non key frames.

To me it wouldn't make sense to say that audio must always have the "key" flag set in SimpleBlock or that it must not use ReferenceBlock elements. Sure, it's rare, but not unheard of. Additionally I don't think it's good practice to have container-level limitations for these things — codec developers don't pay attention to container-level restrictions anyway and develop their stuff & it's up to us to map their stuff properly to our containers. The containers are the unimportant thing in this relationship 😁

Audio & pre-roll

As Steve mentioned, we have a track header element for that. It's up to the player to use it properly.

@robUx4 robUx4 linked a pull request Aug 27, 2021 that will close this issue
robUx4 added a commit that referenced this issue Aug 27, 2021
This is line with #419 and the PR #554

* audio & subtitles are considered RAP by default (noone cared to set the proper flags in SimpleBlock)
* video are RAP if the keyframe flag is set or the BlockGroup has no ReferenceBlock.
* all video reference needed for a frame must be listed in a ReferenceBlock (if using a BlockGroup)
* intro-only (AV1) frames use a ReferenceBlock value of 0
robUx4 added a commit that referenced this issue Nov 21, 2021
This is line with #419 and the PR #554

* audio & subtitles are considered RAP by default (noone cared to set the proper flags in SimpleBlock)
* video are RAP if the keyframe flag is set or the BlockGroup has no ReferenceBlock.
* all video reference needed for a frame must be listed in a ReferenceBlock (if using a BlockGroup)
* intro-only (AV1) frames use a ReferenceBlock value of 0
robUx4 added a commit that referenced this issue Nov 21, 2021
This is line with #419 and the PR #554

* audio & subtitles are considered RAP by default (noone cared to set the proper flags in SimpleBlock)
* video are RAP if the keyframe flag is set or the BlockGroup has no ReferenceBlock.
* all video reference needed for a frame must be listed in a ReferenceBlock (if using a BlockGroup)
* intro-only (AV1) frames use a ReferenceBlock value of 0
robUx4 added a commit that referenced this issue Nov 21, 2021
This is line with #419 and the PR #554

* audio & subtitles are considered RAP by default (noone cared to set the proper flags in SimpleBlock)
* video are RAP if the keyframe flag is set or the BlockGroup has no ReferenceBlock.
* all video reference needed for a frame must be listed in a ReferenceBlock (if using a BlockGroup)
* intro-only (AV1) frames use a ReferenceBlock value of 0
robUx4 added a commit that referenced this issue Feb 13, 2022
This is line with #419 and the PR #554

* audio & subtitles are considered RAP by default (noone cared to set the proper flags in SimpleBlock)
* video are RAP if the keyframe flag is set or the BlockGroup has no ReferenceBlock.
* all video reference needed for a frame must be listed in a ReferenceBlock (if using a BlockGroup)
* intro-only (AV1) frames use a ReferenceBlock value of 0
robUx4 added a commit that referenced this issue Feb 13, 2022
This is line with #419 and the PR #554

* audio & subtitles are considered RAP by default (noone cared to set the proper flags in SimpleBlock)
* video are RAP if the keyframe flag is set or the BlockGroup has no ReferenceBlock.
* all video reference needed for a frame must be listed in a ReferenceBlock (if using a BlockGroup)
* intro-only (AV1) frames use a ReferenceBlock value of 0
robUx4 added a commit that referenced this issue Feb 13, 2022
This is line with #419 and the PR #554

* audio & subtitles are considered RAP by default (noone cared to set the proper flags in SimpleBlock)
* video are RAP if the keyframe flag is set or the BlockGroup has no ReferenceBlock.
* all video reference needed for a frame must be listed in a ReferenceBlock (if using a BlockGroup)
* intro-only (AV1) frames use a ReferenceBlock value of 0
robUx4 added a commit that referenced this issue Mar 13, 2022
This is line with #419 and the PR #554

* audio & subtitles are considered RAP by default (noone cared to set the proper flags in SimpleBlock)
* video are RAP if the keyframe flag is set or the BlockGroup has no ReferenceBlock.
* all video reference needed for a frame must be listed in a ReferenceBlock (if using a BlockGroup)
* intro-only (AV1) frames use a ReferenceBlock value of 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarifications spec_main Main Matroska spec document target
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants