rewrite the TimestampScale Rounding audio-only example

ietf-wg-cellar · Jul 11, 2021 · a599e43 · a599e43
1 parent 48c2d81
commit a599e43
Show file tree

Hide file tree

Showing 2 changed files with 87 additions and 60 deletions.
diff --git a/notes.md b/notes.md
@@ -268,7 +268,7 @@ The `Block Element` and `SimpleBlock Element` store their timestamps as signed i
 to the `Cluster\Timestamp` value of the `Cluster` they are stored in.
 To get the timestamp of a `Block` or `SimpleBlock` in nanoseconds you have to use the following formula:
 
-    Cluster\Timestamp + (signed timestamp * TimestampScale * TrackTimestampScale)
+    (Cluster\Timestamp * TimestampScale) + (signed timestamp * TimestampScale * TrackTimestampScale)
 
 The `Block Element` and `SimpleBlock Element` store their timestamps as 16bit signed integers,
 allowing a range from "-32768" to "+32767" Track Ticks.
@@ -277,70 +277,89 @@ Although these values can be negative, when added to the `Cluster\Timestamp`, th
 
 ## TimestampScale Rounding
 
-Because the default value of `TimestampScale` is 1000000, which makes each integer in the
-`Cluster` and `Block` `Timestamp Elements` equal 1ms, this is the most commonly used.
-When dealing with audio, this causes inaccuracy when seeking. When the audio is combined with video,
-this is not an issue. For most cases, the the synch of audio to video does not need to be more than
-1ms accurate. This becomes obvious when one considers that sound will take 2-3ms to travel a single meter,
-so distance from your speakers will have a greater effect on audio/visual synch than this.
+The default Track Tick duration is one millisecond.
+When dealing with audio, this causes inaccuracy in timestamps.
 
-However, when dealing with audio-only files, seeking accuracy can become critical.
 For instance, when storing a whole CD in a single track, a user will want to be able to seek
 to the exact sample that a song begins at. If seeking a few sample ahead or behind, a crack
 or pop may result as a few odd samples are rendered. Also, when performing precise editing,
-it may be very useful to have the audio accuracy down to a single sample.
+it may be useful to have the audio accuracy down to a single sample.
 
 When storing timestamps for an audio stream, the `TimestampScale Element` **SHOULD** have an accuracy
-of at least that of the audio sample rate, otherwise there are rounding errors that prevent users
-from knowing the precise location of a sample. Here's how a program has to round each timestamp
+of at least that of the audio sample rate, to minimize rounding errors that prevent users
+from knowing the precise location of a sample. Here's how a program **SHOULD** round each timestamp
 in order to be able to recreate the sample number accurately.
 
-Let's assume that the application has an audio track with a sample rate of 44100. As written
-above the `TimestampScale` **MUST** have at least the accuracy of the sample rate itself: 1000000000 / 44100 = 22675.7369614512.
-This value **MUST** always be truncated. Otherwise the accuracy will not suffice.
-So in this example the application will use 22675 for the `TimestampScale`.
-The application could even use some lower value like 22674, which would allow it to be a
-little bit imprecise about the original timestamps. But more about that in a minute.
-
-Next the application wants to write sample number 52340 and calculates the timestamp. This is easy.
-In order to calculate the `Raw Timestamp` in ns all it has to do is calculate
-`Raw Timestamp = round(1000000000 * sample_number / sample_rate)`. Rounding at this stage
-is very important! The application might skip it if it choses a slightly smaller value for
-the `TimestampScale` factor instead of the truncated one like shown above.
-Otherwise it has to round or the results won't be reversible.
-For our example we get `Raw Timestamp = round(1000000000 * 52340 / 44100) = round(1186848072.56236) = 1186848073`.
-
-The next step is to calculate the `Absolute Timestamp` - that is the timestamp that
-will be stored in the Matroska file. Here the application has to divide the `Raw Timestamp`
-from the previous paragraph by the `TimestampScale` factor and round the result:
-`Absolute Timestamp = round(Raw Timestamp / TimestampScale_factor)`, which will result in the
-following for our example: `Absolute Timestamp = round(1186848073 / 22675) = round(52341.7011245866) = 52342`.
-This number is the one the application has to write to the file.
-
-Now our file is complete, and we want to play it back with another application.
-Its task is to find out which sample the first application wrote into the file.
-So it starts reading the Matroska file and finds the `TimestampScale` factor 22675 and
-the audio sample rate 44100. Later it finds a data block with the `Absolute Timestamp` of 52342.
-But how does it get the sample number from these numbers?
-
-First it has to calculate the `Raw Timestamp` of the block it has just read. Here's no
-rounding involved, just an integer multiplication: `Raw Timestamp = Absolute Timestamp * TimestampScale_factor`.
-In our example: `Raw Timestamp = 52342 * 22675 = 1186854850`.
-
-The conversion from the `Raw Timestamp` to the sample number again requires rounding:
-`sample_number = round(Raw Timestamp * sample_rate / 1000000000)`.
-In our example: `sample_number = round(1186854850 * 44100 / 1000000000) = round(52340.298885) = 52340`.
-This is exactly the sample number that the previous program started with.
-
-Some general notes for a program:
-
-1. Always calculate the timestamps / sample numbers with floating point numbers of at least
-   64bit precision (called 'double' in most modern programming languages).
-   If you're calculating with integers, then make sure they're 64bit long, too.
-2. Always round if you divide. Always! If you don't you'll end up with situations in which
-   you have a timestamp in the Matroska file that does not correspond to the sample number
-   that it started with. Using a slightly lower timestamp scale factor can help here in
-   that it removes the need for proper rounding in the conversion from sample number to `Raw Timestamp`.
+Let's assume that the application has an audio track with a sample rate of 44100 Hz. As written
+above the `TimestampScale` **MUST** have at least the accuracy of the sample rate itself: 1,000,000,000 / 44100 = 22675.7369614512.
+This value **MUST** be rounded since `TimestampScale` is an integer.
+So in this example the application will use either "22675" or "22676" for the `TimestampScale`.
+
+Over time the rounding of the sampling period in nanosecond will accumulate,
+both in the `Cluster\Timestamp` and the Block/SimpleBlock timestamps which are both in Segment Tricks,
+assuming the `TrackTimestampScale` is kept at its default value of "1.0".
+
+To avoid this drift, the muxer **SHOULD** take in account how the `Matroska Reader` is going to compute
+the timestamp of a Block/SimpleBlock.
+
+For example if we want to store the timestamp of audio sample number 152340.
+The real timestamp of that sample in nanoseconds is
+
+    152340 * 1,000,000,000 / 44100 = 3454421768.707483 ns
+
+If we stored directly 152340 in the Block/SimpleBlock, the `Matroska Reader` would read it as
+
+    152340 * 22675 * 1.0 = 3454309500.0 ns
+or
+
+    152340 * 22676 * 1.0 = 3454461840.0 ns
+
+Here the `Cluster\Timestamp` is merged into the Block/SimpleBlock timestamp since they are both in
+the same tick unit.
+
+That's a difference of "-112269" and "40071" nanoseconds respectively compared to the real timestamp of the sample.
+When the period for a 44100 Hz is 22675.7369614512 nanoseconds. We are a few samples off.
+
+We want the `Matroska Reader` to be as close as possible to 3454421768.707483 ns for the Block/SimpleBlock timestamp.
+
+So we need to express this timestamps in Track Tick, rather than using the real timestamp formula:
+
+    3454421768.707483 / 22675 = 152344.9512109144 ticks
+or
+
+    3454421768.707483 / 22676 = 152338.2328764986 ticks
+
+So the muxer should store "152345" or "152338" Track Ticks respectively, to express the timestamp of audio sample 152340.
+
+The `Matroska Reader` will read these values as:
+
+    152345 * 22675 * 1.0 = 3454422875.0 ns
+or
+
+    152338 * 22676 * 1.0 = 3454416488.0 ns
+
+Knowing the track sampling frequency the `Matroska Reader` can tell the sample number the timestamp:
+
+    3454422875 * 44100 / 1,000,000,000 = 152340.0487875 ticks
+or
+
+    3454416488 * 44100 / 1,000,000,000 = 152339.7671208 ticks
+
+When using the nearest rounding values, we get the accurate sample value of "152340".
+It works when rounding the `TimestampScale` to the upper and lower integer values.
+
+So a `Matroska Muxer` which is given the timestamp/PTS to store should determine the number of Segment/Track Ticks
+with the following formula:
+
+    timestamp tick = roundB(timestamp in ns / roundA(1,000,000,000 / sampling frequency))
+
+Where `roundA()` is a function rounding the floating number from the division to an integer
+and `roundB()` is a function rounding the floating number from the division to the nearest integer.
+
+For audio tracks, the sampling frequency is the one stored in `Audio\SamplingFrequency`.
+For video tracks, the sampling frequency is the one that **MAY** be stored as a period in `TrackEntry\DefaultDuration`.
+The `Matroska Muxer` **MAY** also know the accurate value from the source material.
+
 
 ## TrackTimestampScale
 
@@ -378,13 +397,13 @@ for the PAL audio is being played back at 25fps, the calculation would be:
 When writing a file that uses a non-default `TrackTimestampScale`, the values of the `Block`'s
 timestamp are whatever they would be when normally storing the track with a default value for
 the `TrackTimestampScale`. However, the data is interleaved a little differently.
-Data **SHOULD** be interleaved by its Raw Timestamp, see (#raw-timestamp), in the order handed back
+Data **SHOULD** be interleaved by its Raw Timestamp, in the order handed back
 from the encoder. The `Raw Timestamp` of a `Block` from a track using `TrackTimestampScale`
 is calculated using:
 
 `(Block's Timestamp + Cluster's Timestamp) * TimestampScale * TrackTimestampScale `
 
-So, a Block from the PAL track above that had a Scaled Timestamp, see (#timestampscale), of 100
+So, a Block from the PAL track above that had a Scaled Timestamp, of 100
 seconds would have a `Raw Timestamp` of 104.66666667 seconds, and so would be stored in that
 part of the file.
 

diff --git a/rfc_backmatter_matroska.md b/rfc_backmatter_matroska.md
@@ -57,11 +57,19 @@
     <author fullname='Simon Pieters'><organization>Opera Software AS</organization></author>
     <author fullname='Silvia Pfeiffer' role='editor'><organization>NICTA</organization></author>
     <author fullname='Philip Jägenstedt'><organization>Opera Software ASA</organization></author>
-    <author fullname='Ian Hickson'><organization>Google</organization></author>
+    <author fullname='Ian Hickson'><organization>Xiph.Org Foundation</organization></author>
     <date day="4" month="April" year="2019" />
   </front>
 </reference>
 
+<reference anchor="Vorbis" target="https://www.xiph.org/vorbis/doc/Vorbis_I_spec.html">
+  <front>
+    <title>Vorbis I specification</title>
+    <author fullname='Xiph.Org Foundation'><organization>Xiph.Org Foundation</organization></author>
+    <date day="4" month="July" year="2020" />
+  </front>
+</reference>
+
 <reference anchor="MCF" target="http://mukoli.free.fr/mcf/mcf.html">
   <front>
     <title>Media Container Format</title>