Implemented convenience parse methods for track tags.

Also fixed up the Kaitai mapping and documentation.
Deep-Symmetry · Sep 3, 2024 · 6901109 · 6901109
1 parent 5ab6d49
commit 6901109
Show file tree

Hide file tree

Showing 3 changed files with 118 additions and 20 deletions.
diff --git a/doc/modules/ROOT/pages/exports.adoc b/doc/modules/ROOT/pages/exports.adoc
@@ -28,7 +28,7 @@ Digger `FileFetcher`] to request this file, use that path as the
 `filePath` argument, and use a `mountPath` value of `/B/` if you want
 to read it from the SD slot, or `/C/` to obtain it from the USB slot).
 
-NOTE: Newer players also support an additonal database with the filename `exportExt.pdb` in the same location, which holds a different and smaller set of table types in it.
+NOTE: Newer players also support an additional database with the filename `exportExt.pdb` in the same location, which holds a different and smaller set of table types in it.
 
 The file is a relational database format designed to be efficiently
 used by very low power devices (there were deployments on 16 bit
@@ -257,7 +257,7 @@ described in <<file-header>>.
 The exact meaning of _unknown~1~_ is unclear. Mr. Flesinak said
 “sequence number (0→1: 8→13, 1→2: 22, 2→3: 27)” but I don’t know how
 to interpret that. Even less is known about _unknown~2~_ . But
-__num_rows_small__ at byte `18` within the page (abbrviated _n~rs~_ in
+__num_rows_small__ at byte `18` within the page (abbreviated _n~rs~_ in
 the byte field diagram above) holds the number of rows that are
 present in the page, unless __num_rows_large__ (below) holds a value
 that is larger than it (but not equal to `1fff`). This seems like a
@@ -273,7 +273,7 @@ Flesniak said “a bitmask (first track: 32)”, and he described _u~4~_
 as “often 0, sometimes larger, especially for pages with a high number
 of rows (e.g. 12 for 101 rows)”.
 
-Byte{nbsp}``1b`` is called __page_flags__ (abbrviated _p~f~_ in the
+Byte{nbsp}``1b`` is called __page_flags__ (abbreviated _p~f~_ in the
 diagram). According to Mr. Flesniak, “strange” (non-data) pages will
 have the value `44` or `64`, and other pages have had the values `24`
 or `34`. Crate Digger considers a page to be a data page if
@@ -288,7 +288,7 @@ stores the number of bytes that are in use in the page heap.
 Bytes{nbsp}``20``-`21`, _u~5~_ , are of unclear purpose. Mr. Flesniak
 labeled them “(0→1: 2).”
 
-Bytes{nbsp}``22``-`23`, __num_rows_large__ (abbrviated _num~rl~_ in
+Bytes{nbsp}``22``-`23`, __num_rows_large__ (abbreviated _num~rl~_ in
 the diagram) hold the number of entries in the row index at the end of
 the page when that value is too large to fit into __num_rows_small__
 (as mentioned above), and that situation seems to be indicated when
@@ -914,7 +914,7 @@ is 126 bytes.
 
 NOTE: DeviceSQL strings do not have terminator bytes, so attempting to
 read more bytes than present can lead to garbage characters being
-present or crashing the parser for the more complex unicode strings.
+present or crashing the parser for the more complex Unicode strings.
 <<isrc-strings, ISRC Strings>> are the only exception.
 
 [[long-strings]]

diff --git a/src/main/java/org/deepsymmetry/cratedigger/DatabaseExt.java b/src/main/java/org/deepsymmetry/cratedigger/DatabaseExt.java
@@ -42,35 +42,123 @@ public DatabaseExt(File sourceFile) throws IOException {
         databaseUtil = new DatabaseUtil(sourceFile, true);
         final Map<Long, RekordboxPdb.TagRow> mutableTagIndex = new HashMap<>();
         final SortedMap<String, SortedSet<Long>> mutableTagNameIndex = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);
+        final Map<Long, RekordboxPdb.TagRow> mutableCategoryIndex = new HashMap<>();
+        final SortedMap<String, SortedSet<Long>> mutableCategoryNameIndex = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);
 
         databaseUtil.indexRows(RekordboxPdb.PageTypeExt.TAGS, row -> {
-            // We found a tag; index it by its ID.
+            // We found a tag or category; index it by its ID.
             RekordboxPdb.TagRow tagRow = (RekordboxPdb.TagRow)row;
             final long id = tagRow.id();
-            mutableTagIndex.put(id, tagRow);
-            // ALso index the tags by name.
+            if (tagRow.isCategory()) {
+                mutableCategoryIndex.put(id, tagRow);
+            } else {
+                mutableTagIndex.put(id, tagRow);
+            }
+            // ALso index the tag and categories by name.
             final String title = Database.getText(tagRow.name());
-            databaseUtil.addToSecondaryIndex(mutableTagNameIndex, title, tagRow.id());
+            if (tagRow.isCategory()) {
+                databaseUtil.addToSecondaryIndex(mutableCategoryNameIndex, title, tagRow.id());
+
+            } else {
+                databaseUtil.addToSecondaryIndex(mutableTagNameIndex, title, tagRow.id());
+            }
         });
         tagIndex = Collections.unmodifiableMap(mutableTagIndex);
-        logger.info("Indexed {} Tags.", tagIndex.size());
+        tagCategoryIndex = Collections.unmodifiableMap(mutableCategoryIndex);
+        logger.info("Indexed {} Tag names in {} categories.", tagIndex.size(), tagCategoryIndex.size());
         tagNameIndex = databaseUtil.freezeSecondaryIndex(mutableTagNameIndex);
+        tagCategoryNameIndex = databaseUtil.freezeSecondaryIndex(mutableCategoryNameIndex);
+
+        // Build the list of category names in the order in which they should be displayed.
+        String[] mutableTagCategoryNameOrder = new String[tagCategoryIndex.size()];
+        for (RekordboxPdb.TagRow row : tagCategoryIndex.values()) {
+            mutableTagCategoryNameOrder[(int) row.categoryPos()] = Database.getText(row.name());
+        }
+        tagCategoryNameOrder = List.of(mutableTagCategoryNameOrder);
+
+        // For each category build the list of tag names in that category, in the order they should be displayed.
+        final Map<Long,ArrayList<RekordboxPdb.TagRow>> mutableCategoryContents = new HashMap<>();
+        for (RekordboxPdb.TagRow row : tagIndex.values()) {
+            mutableCategoryContents.computeIfAbsent(row.category(), k -> new ArrayList<>()).add(row);
+        }
+        final Map<Long,List<String>> mutableTagCategoryTagNameOrder = new HashMap<>();
+        for (Long categoryId : mutableCategoryContents.keySet()) {
+            final List<RekordboxPdb.TagRow> category = mutableCategoryContents.get(categoryId);
+            final String[] mutableNames = new String[category.size()];
+            for (RekordboxPdb.TagRow row : category) {
+                mutableNames[(int) row.categoryPos()] = Database.getText(row.name());
+            }
+            mutableTagCategoryTagNameOrder.put(categoryId, List.of(mutableNames));
+        }
+        tagCategoryTagNameOrder = Collections.unmodifiableMap(mutableTagCategoryTagNameOrder);
+
+        // Gather and index the track tag and tag category information.
+        final Map<Long,Set<Long>> mutableTagTrackIndex = new HashMap<>();
+        final Map<Long, Set<Long>> mutableTrackTagIndex = new HashMap<>();
+        databaseUtil.indexRows(RekordboxPdb.PageTypeExt.TAG_TRACKS, row -> {
+            RekordboxPdb.TagTrackRow tagTrackRow = (RekordboxPdb.TagTrackRow)row;
+            mutableTagTrackIndex.computeIfAbsent(tagTrackRow.tagId(), k -> new HashSet<>()).add(tagTrackRow.trackId());
+            mutableTrackTagIndex.computeIfAbsent(tagTrackRow.trackId(), k -> new HashSet<>()).add(tagTrackRow.tagId());
+        });
+
+        mutableTagTrackIndex.replaceAll((k, v) -> Collections.unmodifiableSet(mutableTagTrackIndex.get(k)));
+        tagTrackIndex = Collections.unmodifiableMap(mutableTagTrackIndex);
 
-        // TODO: Gather and index the track tag information.
+        mutableTrackTagIndex.replaceAll((k, v) -> Collections.unmodifiableSet(mutableTrackTagIndex.get(k)));
+        trackTagIndex = Collections.unmodifiableMap(mutableTrackTagIndex);
+
+        logger.info("Indexed {} tags on {} tagged tracks.", tagTrackIndex.size(), trackTagIndex.size());
     }
 
     /**
-     * A map from tag ID to the actual tag object.
+     * A map from tag ID to the actual tag object (does not include rows that are categories).
      */
     @API(status = API.Status.EXPERIMENTAL)
     public final Map<Long, RekordboxPdb.TagRow> tagIndex;
 
     /**
-     * A sorted map from tag names to the IDs of tags with that name.
+     * A map from tag ID to the actual category object (includes only rows that are categories).
+     */
+    @API(status = API.Status.EXPERIMENTAL)
+    public final Map<Long, RekordboxPdb.TagRow> tagCategoryIndex;
+
+    /**
+     * A sorted map from tag names to the IDs of tags with that name (does not include categories).
      */
     @API(status = API.Status.EXPERIMENTAL)
     public final SortedMap<String, SortedSet<Long>> tagNameIndex;
 
+    /**
+     * A sorted map from category names to the IDs of tag categories with that name (only includes categories).
+     */
+    @API(status = API.Status.EXPERIMENTAL)
+    public final SortedMap<String, SortedSet<Long>> tagCategoryNameIndex;
+
+    /**
+     * The list of category names in the order that they are supposed to be presented to the user.
+     */
+    @API(status = API.Status.EXPERIMENTAL)
+    public final List<String> tagCategoryNameOrder;
+
+    /**
+     * A map from category ID to the list of tag names that belong to that category,
+     * in the order that they are supposed to be presented to the user.
+     */
+    @API(status = API.Status.EXPERIMENTAL)
+    public final Map<Long,List<String>> tagCategoryTagNameOrder;
+
+    /**
+     * A map from tag ID to the IDs of all tracks that have been assigned that tag.
+     */
+    @API(status = API.Status.EXPERIMENTAL)
+    final Map<Long,Set<Long>> tagTrackIndex;
+
+    /**
+     * A map from track ID to the IDs of all tags that have been assigned to that track.
+     */
+    @API(status = API.Status.EXPERIMENTAL)
+    final Map<Long, Set<Long>> trackTagIndex;
+
 
     /**
      * Close the file underlying the parsed database. This needs to be called if you want to be able

diff --git a/src/main/kaitai/rekordbox_pdb.ksy b/src/main/kaitai/rekordbox_pdb.ksy
@@ -378,7 +378,7 @@ types:
         doc: |
           The actual content of the row in an exportExt.pdb file, as long as it is present.
         -webide-parse-mode: eager
-    -webide-representation: '{body.name.body.text}{body.title.body.text} ({body.id})'
+    -webide-representation: '{body.name.body.text}{body.title.body.text}{body_ext.name.body.text} ({body.id}{body_ext.id})'
 
   album_row:
     doc: |
@@ -921,22 +921,22 @@ types:
       - id: category
         type: u4
         doc: |
-          The index of the tag category this tag belongs to.
+          The ID of the tag category this tag belongs to.
           If this row represents a tag category, this field is zero.
       - id: category_pos
         type: u4
         doc: |
-          The position of this tag in its category.
-          If this row represents a tag category, this field equals (id - 1).
+          The zero-based position of this tag in its category.
+          If this row represents a tag category, the zero-based position of the category itself in the category list.
       - id: id
         type: u4
         doc: |
           The ID of this tag or tag category.
           Referenced by tag_track_row if this row is a tag.
-      - id: is_category
+      - id: raw_is_category
         type: u4
         doc: |
-          Whether this row stores a tag category name instead of a tag.
+          Non-zero when this row stores a tag category instead of a tag.
       - type: u2
         doc: |
           Seems to always be 0x03, 0x1f.
@@ -951,6 +951,12 @@ types:
       - type: u1
         doc: |
           This seems to always be 0x03.
+    instances:
+      is_category:
+        value: raw_is_category != 0
+        doc: |
+          Indicates whether this row stores a tag category instead of a tag.
+        -webide-parse-mode: eager
 
   tag_track_row:
     doc: |
@@ -961,11 +967,15 @@ types:
           Seems to always be zero.
       - id: track_id
         type: u4
+        doc: |
+          The ID of the track that has a tag assigned to it.
       - id: tag_id
         type: u4
+        doc: |
+          The ID of the tag that has been assigned to a track.
       - type: u4
         doc: |
-          Seems to always be 3.
+          Seems to always be 0x03.
 
   device_sql_string:
     doc: |