ietf-wg-cellar · robUx4 · Nov 1, 2020 · Jul 11, 2021
diff --git a/ebml_matroska.xml b/ebml_matroska.xml
@@ -312,9 +312,9 @@ If set to 0, the reference pseudo-cache system is not used.</documentation>
 see (#defaultdecodedfieldduration) for more information</documentation>
     <extension type="libmatroska" cppname="TrackDefaultDecodedFieldDuration"/>
   </element>
-  <element name="TrackTimestampScale" path="\Segment\Tracks\TrackEntry\TrackTimestampScale" id="0x23314F" type="float" maxver="3" range="&gt; 0x0p+0" default="0x1p+0" minOccurs="1" maxOccurs="1">
-    <documentation lang="en" purpose="definition">DEPRECATED, DO NOT USE. The scale to apply on this track to work at normal speed in relation with other tracks
-(mostly used to adjust video speed when the audio length differs).</documentation>
+  <element name="TrackTimestampScale" path="\Segment\Tracks\TrackEntry\TrackTimestampScale" id="0x23314F" type="float" range="&gt; 0x0p+0" default="0x1p+0" minOccurs="1" maxOccurs="1">
+    <documentation lang="en" purpose="definition">The scale to apply on this track to work at normal speed in relation with other tracks.
+Mostly used to adjust video speed when the audio length differs or to have more accurate timestamps on each track.</documentation>
     <extension type="libmatroska" cppname="TrackTimecodeScale"/>
   </element>
   <element name="TrackOffset" path="\Segment\Tracks\TrackEntry\TrackOffset" id="0x537F" type="integer" minver="0" maxver="0" default="0" maxOccurs="1">

diff --git a/notes.md b/notes.md
@@ -314,63 +314,93 @@ Some general notes for a program:
 
 ## TrackTimestampScale
 
-The `TrackTimestampScale Element` is used align tracks that would otherwise be played at
-different speeds. An example of this would be if you have a film that was originally recorded
-at 24fps video. When playing this back through a PAL broadcasting system, it is standard to
-speed up the film to 25fps to match the 25fps display speed of the PAL broadcasting standard.
-However, when broadcasting the video through NTSC, it is typical to leave the film at its
-original speed. If you wanted to make a single file where there was one video stream,
-and an audio stream used from the PAL broadcast, as well as an audio stream used from the NTSC
-broadcast, you would have the problem that the PAL audio stream would be 1/24th faster than
-the NTSC audio stream, quickly leading to problems. It is possible to stretch out the PAL
-audio track and re-encode it at a slower speed, however when dealing with lossy audio codecs,
-this often results in a loss of audio quality and/or larger file sizes.
-
-This is the type of problem that `TrackTimestampScale` was designed to fix. Using it,
-the video can be played back at a speed that will synch with either the NTSC or the PAL
-audio stream, depending on which is being used for playback.
-To continue the above example:
-
-    Track 1: Video
-    Track 2: NTSC Audio
-    Track 3: PAL Audio
-
-Because the NTSC track is at the original speed, it will used as the default value of 1.0 for
-its `TrackTimestampScale`. The video will also be aligned to the NTSC track with the default value of 1.0.
-
-The `TrackTimestampScale` value to use for the PAL track would be calculated by
-determining how much faster the PAL track is than the NTSC track. In this case,
-because we know the video for the NTSC audio is being played back at 24fps and the video
-for the PAL audio is being played back at 25fps, the calculation would be:
-
-25/24 is almost 1.04166666666666666667
-
-When writing a file that uses a non-default `TrackTimestampScale`, the values of the `Block`'s
-timestamp are whatever they would be when normally storing the track with a default value for
-the `TrackTimestampScale`. However, the data is interleaved a little differently.
-Data **SHOULD** be interleaved by its Raw Timestamp, see (#raw-timestamp), in the order handed back
-from the encoder. The `Raw Timestamp` of a `Block` from a track using `TrackTimestampScale`
-is calculated using:
-
-`(Block's Timestamp + Cluster's Timestamp) * TimestampScale * TrackTimestampScale `
-
-So, a Block from the PAL track above that had a Scaled Timestamp, see (#timestamp-types), of 100
-seconds would have a `Raw Timestamp` of 104.66666667 seconds, and so would be stored in that
-part of the file.
-
-When playing back a track using the `TrackTimestampScale`, if the track is being played by itself,
-there is no need to scale it. From the above example, when playing the Video with the NTSC Audio,
-neither are scaled. However, when playing back the Video with the PAL Audio, the timestamps
-from the PAL Audio track are scaled using the `TrackTimestampScale`, resulting in the video
-playing back in synch with the audio.
-
-It would be possible for a `Matroska Player` to also adjust the audio's samplerate at the
-same time as adjusting the timestamps if you wanted to play the two audio streams synchronously.
-It would also be possible to adjust the video to match the audio's speed. However,
-for playback, the selected track(s) timestamps **SHOULD** be adjusted if they need to be scaled.
-
-While the above example deals specifically with audio tracks, this element can be used
-to align video, audio, subtitles, or any other type of track contained in a Matroska file.
+The `TrackTimestampScale Element` was originally designed to allow adjusting the Track tick
+amount in nanosecond without having to remux the whole file.
+This was an odd an unused feature because the further you get in the file,
+the further the audio and video tracks would drift away.
+In the end the matching audio and video Blocks would be in different Clusters.
+This is why the `TrackTimestampScale Element` is rarely used and often not handled
+at all in `Matroska Readers`.
+
+It **MAY** however be used to have more accurate timestamps in the Blocks.
+
+For example an audio track at 44100 Hz. Each sample lasts
+
+    1,000,000,000 / 44100 = 22675.73696145125 ns
+
+This is not an integer number that can be stored in Blocks.
+But it is possible to get a better approximation than when `TrackTimestampScale` is "1.0".
+
+For example with `TimestampScale` of "1", we could set `TrackTimestampScale` to "22675.73696145125".
+The timestamp in a Block is then transformed into nanoseconds using this formula:
+
+    signed timestamp * TimestampScale * TrackTimestampScale
+    signed timestamp * 22675.73696145125
+
+The range of a Block is from "-32768" to "+32767" Track Ticks.
+Which is "-743038548.7528346" to "743015873.0158731" nanoseconds or "-0.743" to "0.743" seconds.
+This is not enough for most use cases or too many Clusters would be necessary.
+But fortunately audio samples are usually grouped together.
+For example in [@?Vorbis] they are grouped by 64 to 8192 samples.
+Giving at least a maximum range in a Cluster of
+
+    (0.743 + 0.743) * 64 = 95.1 seconds
+
+The `TrackTimestampScale` would be 22675.73696145125 * 64 = 1451247.16553288.
+
+Even with a high sampling frequency of 352800 Hz with a codec that packs 40 samples per frames (Dolby TrueHD),
+we still get a large maximum range in each Cluster:
+
+    65535 * (1,000,000,000 / 352,800) * 40 = 7,43 s
+
+The `TimestampScale` can still be the default value of "1,000,000",
+as long as the `TrackTimestampScale` matches the duration of one or more samples:
+
+    TimestampScale * TrackTimestampScale = 1,000,000,000 / 44100 ns
+    TrackTimestampScale = 1,000,000,000 / 44100 / TimestampScale ns
+    TrackTimestampScale = 1,000,000,000 / 44100 / 1,000,000
+    TrackTimestampScale = 1,000 / 44100
+    TrackTimestampScale = 0.02267573696145125
+
+Storing the timestamp of audio sample number 152340 is slightly different that in (#timestampscale-rounding).
+
+The real timestamp of that sample in nanoseconds is
+
+    152340 * 1,000,000,000 / 44100 = 3454421768.707483 ns
+
+We can store 152340. The `Matroska Reader` will then apply the formula:
+
+    signed timestamp * TimestampScale * TrackTimestampScale
+    152340 * 1,000,000 * 0.02267573696145125 ns
+    3,454,421,768.707483 ns
+
+Which is exactly the proper timestamp for that sample.
+There is however a rounding involved as we can't store 152340 in a Block/SimpleBlock which has a range of 65535 Track Ticks.
+The `Cluster\Timestamp` needs to be involved.
+
+With a `TimestampScale` of "1,000,000" we could set the `Cluster\Timestamp` to "3454". Which is 3,454,000,000 ns.
+The Block/SimpleBlock has to store the equivalent of "421,768.707483" ns. With a Track Tick of "22,675.73696145125" ns,
+that represents "18.6" ticks. Which is either stored as "18" or "19" in the Block/SimpleBlock.
+
+The `Matroska Reader` will then read the timestamp as
+
+    3454 * 1,000,000 + 19 * 1,000,000 * 0.02267573696145125
+    3,454,430,839.0023 ns
+
+That's a difference of 9070.294817 ns, which is less than half the duration of a sample: 22675.73696145125 ns.
+So any rounding will still end up on the proper sample.
+
+The worst case scenario for rounding margin is if the Block/SimpleBlock that should be stored is exactly between two integers,
+for example "18.5". The worst cast rounding error is "0.5" Track Ticks:
+
+    0.5 * 1,000,000 * 0.02267573696145125
+    0.5 * 1,000,000 * (1,000,000,000 / 44100 / 1,000,000)
+    0.5 * (1,000,000,000 / 44100)
+    0.5 sample duration in nanoseconds
+
+To avoid this interdeminate state, the value stored in a Block/Simple should be the nearest integer
+and 0.5 **MUST** use the lowest near integer.
+
 
 # Encryption
 

diff --git a/rfc_backmatter_matroska.md b/rfc_backmatter_matroska.md
@@ -62,6 +62,14 @@
   </front>
 </reference>
 
+<reference anchor="Vorbis" target="https://www.xiph.org/vorbis/doc/Vorbis_I_spec.html">
+  <front>
+    <title>Vorbis I specification</title>
+    <author fullname='Xiph.Org Foundation'><organization>Xiph.Org Foundation</organization></author>
+    <date day="4" month="July" year="2020" />
+  </front>
+</reference>
+
 <reference anchor="MCF" target="http://mukoli.free.fr/mcf/mcf.html">
   <front>
     <title>Media Container Format</title>