Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow TrackTimestampScale in Matroska v4 #437

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions ebml_matroska.xml
Original file line number Diff line number Diff line change
Expand Up @@ -312,9 +312,9 @@ If set to 0, the reference pseudo-cache system is not used.</documentation>
see (#defaultdecodedfieldduration) for more information</documentation>
<extension type="libmatroska" cppname="TrackDefaultDecodedFieldDuration"/>
</element>
<element name="TrackTimestampScale" path="\Segment\Tracks\TrackEntry\TrackTimestampScale" id="0x23314F" type="float" maxver="3" range="&gt; 0x0p+0" default="0x1p+0" minOccurs="1" maxOccurs="1">
<documentation lang="en" purpose="definition">DEPRECATED, DO NOT USE. The scale to apply on this track to work at normal speed in relation with other tracks
(mostly used to adjust video speed when the audio length differs).</documentation>
<element name="TrackTimestampScale" path="\Segment\Tracks\TrackEntry\TrackTimestampScale" id="0x23314F" type="float" range="&gt; 0x0p+0" default="0x1p+0" minOccurs="1" maxOccurs="1">
<documentation lang="en" purpose="definition">The scale to apply on this track to work at normal speed in relation with other tracks.
Mostly used to adjust video speed when the audio length differs or to have more accurate timestamps on each track.</documentation>
<extension type="libmatroska" cppname="TrackTimecodeScale"/>
</element>
<element name="TrackOffset" path="\Segment\Tracks\TrackEntry\TrackOffset" id="0x537F" type="integer" minver="0" maxver="0" default="0" maxOccurs="1">
Expand Down
144 changes: 87 additions & 57 deletions notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -314,63 +314,93 @@ Some general notes for a program:

## TrackTimestampScale

The `TrackTimestampScale Element` is used align tracks that would otherwise be played at
different speeds. An example of this would be if you have a film that was originally recorded
at 24fps video. When playing this back through a PAL broadcasting system, it is standard to
speed up the film to 25fps to match the 25fps display speed of the PAL broadcasting standard.
However, when broadcasting the video through NTSC, it is typical to leave the film at its
original speed. If you wanted to make a single file where there was one video stream,
and an audio stream used from the PAL broadcast, as well as an audio stream used from the NTSC
broadcast, you would have the problem that the PAL audio stream would be 1/24th faster than
the NTSC audio stream, quickly leading to problems. It is possible to stretch out the PAL
audio track and re-encode it at a slower speed, however when dealing with lossy audio codecs,
this often results in a loss of audio quality and/or larger file sizes.

This is the type of problem that `TrackTimestampScale` was designed to fix. Using it,
the video can be played back at a speed that will synch with either the NTSC or the PAL
audio stream, depending on which is being used for playback.
To continue the above example:

Track 1: Video
Track 2: NTSC Audio
Track 3: PAL Audio

Because the NTSC track is at the original speed, it will used as the default value of 1.0 for
its `TrackTimestampScale`. The video will also be aligned to the NTSC track with the default value of 1.0.

The `TrackTimestampScale` value to use for the PAL track would be calculated by
determining how much faster the PAL track is than the NTSC track. In this case,
because we know the video for the NTSC audio is being played back at 24fps and the video
for the PAL audio is being played back at 25fps, the calculation would be:

25/24 is almost 1.04166666666666666667

When writing a file that uses a non-default `TrackTimestampScale`, the values of the `Block`'s
timestamp are whatever they would be when normally storing the track with a default value for
the `TrackTimestampScale`. However, the data is interleaved a little differently.
Data **SHOULD** be interleaved by its Raw Timestamp, see (#raw-timestamp), in the order handed back
from the encoder. The `Raw Timestamp` of a `Block` from a track using `TrackTimestampScale`
is calculated using:

`(Block's Timestamp + Cluster's Timestamp) * TimestampScale * TrackTimestampScale `

So, a Block from the PAL track above that had a Scaled Timestamp, see (#timestamp-types), of 100
seconds would have a `Raw Timestamp` of 104.66666667 seconds, and so would be stored in that
part of the file.

When playing back a track using the `TrackTimestampScale`, if the track is being played by itself,
there is no need to scale it. From the above example, when playing the Video with the NTSC Audio,
neither are scaled. However, when playing back the Video with the PAL Audio, the timestamps
from the PAL Audio track are scaled using the `TrackTimestampScale`, resulting in the video
playing back in synch with the audio.

It would be possible for a `Matroska Player` to also adjust the audio's samplerate at the
same time as adjusting the timestamps if you wanted to play the two audio streams synchronously.
It would also be possible to adjust the video to match the audio's speed. However,
for playback, the selected track(s) timestamps **SHOULD** be adjusted if they need to be scaled.

While the above example deals specifically with audio tracks, this element can be used
to align video, audio, subtitles, or any other type of track contained in a Matroska file.
The `TrackTimestampScale Element` was originally designed to allow adjusting the Track tick
amount in nanosecond without having to remux the whole file.
This was an odd an unused feature because the further you get in the file,
the further the audio and video tracks would drift away.
In the end the matching audio and video Blocks would be in different Clusters.
This is why the `TrackTimestampScale Element` is rarely used and often not handled
at all in `Matroska Readers`.

It **MAY** however be used to have more accurate timestamps in the Blocks.

For example an audio track at 44100 Hz. Each sample lasts

1,000,000,000 / 44100 = 22675.73696145125 ns

This is not an integer number that can be stored in Blocks.
But it is possible to get a better approximation than when `TrackTimestampScale` is "1.0".

For example with `TimestampScale` of "1", we could set `TrackTimestampScale` to "22675.73696145125".
The timestamp in a Block is then transformed into nanoseconds using this formula:

signed timestamp * TimestampScale * TrackTimestampScale
signed timestamp * 22675.73696145125

The range of a Block is from "-32768" to "+32767" Track Ticks.
Which is "-743038548.7528346" to "743015873.0158731" nanoseconds or "-0.743" to "0.743" seconds.
This is not enough for most use cases or too many Clusters would be necessary.
But fortunately audio samples are usually grouped together.
For example in [@?Vorbis] they are grouped by 64 to 8192 samples.
Giving at least a maximum range in a Cluster of

(0.743 + 0.743) * 64 = 95.1 seconds

The `TrackTimestampScale` would be 22675.73696145125 * 64 = 1451247.16553288.

Even with a high sampling frequency of 352800 Hz with a codec that packs 40 samples per frames (Dolby TrueHD),
we still get a large maximum range in each Cluster:

65535 * (1,000,000,000 / 352,800) * 40 = 7,43 s

The `TimestampScale` can still be the default value of "1,000,000",
as long as the `TrackTimestampScale` matches the duration of one or more samples:

TimestampScale * TrackTimestampScale = 1,000,000,000 / 44100 ns
TrackTimestampScale = 1,000,000,000 / 44100 / TimestampScale ns
TrackTimestampScale = 1,000,000,000 / 44100 / 1,000,000
TrackTimestampScale = 1,000 / 44100
TrackTimestampScale = 0.02267573696145125

Storing the timestamp of audio sample number 152340 is slightly different that in (#timestampscale-rounding).

The real timestamp of that sample in nanoseconds is

152340 * 1,000,000,000 / 44100 = 3454421768.707483 ns

We can store 152340. The `Matroska Reader` will then apply the formula:

signed timestamp * TimestampScale * TrackTimestampScale
152340 * 1,000,000 * 0.02267573696145125 ns
3,454,421,768.707483 ns

Which is exactly the proper timestamp for that sample.
There is however a rounding involved as we can't store 152340 in a Block/SimpleBlock which has a range of 65535 Track Ticks.
The `Cluster\Timestamp` needs to be involved.

With a `TimestampScale` of "1,000,000" we could set the `Cluster\Timestamp` to "3454". Which is 3,454,000,000 ns.
The Block/SimpleBlock has to store the equivalent of "421,768.707483" ns. With a Track Tick of "22,675.73696145125" ns,
that represents "18.6" ticks. Which is either stored as "18" or "19" in the Block/SimpleBlock.

The `Matroska Reader` will then read the timestamp as

3454 * 1,000,000 + 19 * 1,000,000 * 0.02267573696145125
3,454,430,839.0023 ns

That's a difference of 9070.294817 ns, which is less than half the duration of a sample: 22675.73696145125 ns.
So any rounding will still end up on the proper sample.

The worst case scenario for rounding margin is if the Block/SimpleBlock that should be stored is exactly between two integers,
for example "18.5". The worst cast rounding error is "0.5" Track Ticks:

0.5 * 1,000,000 * 0.02267573696145125
0.5 * 1,000,000 * (1,000,000,000 / 44100 / 1,000,000)
0.5 * (1,000,000,000 / 44100)
0.5 sample duration in nanoseconds

To avoid this interdeminate state, the value stored in a Block/Simple should be the nearest integer
and 0.5 **MUST** use the lowest near integer.


# Encryption

Expand Down
8 changes: 8 additions & 0 deletions rfc_backmatter_matroska.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,14 @@
</front>
</reference>

<reference anchor="Vorbis" target="https://www.xiph.org/vorbis/doc/Vorbis_I_spec.html">
<front>
<title>Vorbis I specification</title>
<author fullname='Xiph.Org Foundation'><organization>Xiph.Org Foundation</organization></author>
<date day="4" month="July" year="2020" />
</front>
</reference>

<reference anchor="MCF" target="http://mukoli.free.fr/mcf/mcf.html">
<front>
<title>Media Container Format</title>
Expand Down