Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lastgenre: New config option keep_existing #4982

Open
wants to merge 47 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
512576c
Quickfix lastgenre always overwriting multi-genre
JOJ0 May 3, 2023
ca33515
Fix track-level genre handling in lastgenre plugin
JOJ0 Aug 30, 2023
699f9d1
Handle dups of existing genres in lastgenre plugin
JOJ0 Jun 17, 2023
79a4856
Refactor lastgenre keep_allowed to list comprehension
JOJ0 Nov 16, 2023
5d3f5bd
Use separator as configured instead of hardcoding
JOJ0 Nov 16, 2023
f2e20dd
Use provided deduplicate function for keep_allowed
JOJ0 Nov 16, 2023
eea0e3b
Add lastgenre keep_allowed options (-k/-K)
JOJ0 Sep 14, 2024
f21ff6c
Docs for lastgenre keep_allowed/force
JOJ0 Sep 15, 2024
2e6f448
Implement --force and --keep-allowed behaviours
JOJ0 Sep 17, 2024
14a0f7e
Refactor keep/new genres combination
JOJ0 Oct 31, 2024
cfc4c98
Quickfix constant msgpack poetry issue
JOJ0 Nov 1, 2024
b6a2e38
lastgenre: Add comments over groups of methods
JOJ0 Nov 1, 2024
0a023e0
Experiment with test_lastgenre
JOJ0 Dec 15, 2024
5a06cf4
Fix default for _dedup_genre whitelist arg
JOJ0 Jan 2, 2025
79c81c2
_combine_and_label return None not empty str
JOJ0 Jan 2, 2025
5fdf5ec
Fallback to next stage when fetch_ returns None
JOJ0 Jan 2, 2025
e1e8ad4
Refactor _get_genre test to parametrized pytest
JOJ0 Jan 1, 2025
795e8fe
Add lastgenre testcase with unicode \0 separator
JOJ0 Jan 2, 2025
2da5f89
Temporary lastgenre debug messages for @arsaboo
JOJ0 Jan 3, 2025
d8ddeb9
Clarify lastgenre _is_allowed docstring
JOJ0 Jan 3, 2025
dfb0b9c
Rewrite docs for keep_allowed/existing change
JOJ0 Jan 3, 2025
f1d12a5
Rename lastgenre option, refactor, new default
JOJ0 Jan 3, 2025
2dd33f4
lastgenre test _get_genre for renamed keep_existing
JOJ0 Jan 3, 2025
b6a664a
Fix lastgenre "count" issue
JOJ0 Jan 4, 2025
1e4b132
Handle genres as list, count/format/str helper
JOJ0 Jan 5, 2025
eb1aba8
Refactor lastgenre mocked fetchers return list
JOJ0 Jan 5, 2025
8760779
Fix/add lastgenre fallback tests
JOJ0 Jan 5, 2025
5d41f6b
lastgenre _is_allowed detailed logging
JOJ0 Jan 5, 2025
a571287
Refactor again _combine_and_label_genres
JOJ0 Jan 6, 2025
c80e5f9
Fix lastgenre limit to count test
JOJ0 Jan 6, 2025
be2e010
Fix _dedup_genre, ensure lower case
JOJ0 Jan 7, 2025
e794339
_resolve_genre as list tests, add test_to_delimited_string
JOJ0 Jan 6, 2025
e0a206d
Final lastgenre plugin linting
JOJ0 Jan 8, 2025
4d447a7
Final lastgenre docs changes
JOJ0 Jan 8, 2025
8552a2f
Add a temporary log around whitelist setup
JOJ0 Jan 8, 2025
6c7213d
Temporary debug messages in _sort_by_depth
JOJ0 Jan 9, 2025
d75d7f1
Fix previous temp log messages
JOJ0 Jan 9, 2025
9a10eb5
Fix most popular track genre fetching (VA albums)
JOJ0 Jan 9, 2025
8055657
Remove all lastgenre temporary debug logging
JOJ0 Jan 9, 2025
f1e2c70
Polish 'fetched last.fm tags' debug message
JOJ0 Jan 9, 2025
94fbb8e
Changelog for #4982
JOJ0 Jan 10, 2025
7894776
Apply type hint suggestions from review
JOJ0 Jan 11, 2025
19b3131
Integrate _format_tag in _to_delimited_...
JOJ0 Jan 11, 2025
41ec574
Apply temp logging leftover review suggestions
JOJ0 Jan 11, 2025
9fe3772
Prevent album genre inherit only when source:track
JOJ0 Nov 3, 2023
a39b9e6
Refactor and rename _is_valid() helper
JOJ0 Jan 11, 2025
e5ccab9
Reconsider _is_valid refactored version
JOJ0 Jan 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
224 changes: 173 additions & 51 deletions beetsplug/lastgenre/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,15 +103,27 @@ def __init__(self):
"canonical": False,
"source": "album",
"force": True,
"keep_existing": True,
"auto": True,
"separator": ", ",
"prefer_specific": False,
"title_case": True,
}
)

self.config_validation()
self.setup()

def config_validation(self) -> None:
"""Quits plugin when invalid configurations are detected."""
keep_existing = self.config["keep_existing"].get()
force = self.config["force"].get()

if keep_existing and not force:
raise ui.UserError(
"Invalid lastgenre plugin configuration (enable force with "
"keep_existing!)"
)

def setup(self):
"""Setup plugin from config options"""
if self.config["auto"]:
Expand Down Expand Up @@ -165,6 +177,20 @@ def sources(self):
elif source == "artist":
return ("artist",)

# More canonicalization and general helpers.

def _to_delimited_genre_string(self, tags: list[str]) -> str:
"""Reduce tags list to configured count, format and return as delimited
string."""
separator = self.config["separator"].as_str()
max_count = self.config["count"].get(int)

genres = tags[:max_count]
if self.config["title_case"]:
genres = [g.title() for g in genres]

return separator.join(genres)

def _get_depth(self, tag):
"""Find the depth of a tag in the genres tree."""
depth = None
Expand All @@ -184,9 +210,7 @@ def _sort_by_depth(self, tags):
return [p[1] for p in depth_tag_pairs]

def _resolve_genres(self, tags):
"""Given a list of strings, return a genre by joining them into a
single string and (optionally) canonicalizing each.
"""
"""Given a list of genre strings, filters, sorts and canonicalizes."""
if not tags:
return None

Expand All @@ -201,7 +225,7 @@ def _resolve_genres(self, tags):
parents = [
x
for x in find_parents(tag, self.c14n_branches)
if self._is_allowed(x)
if self._is_valid(x)
]
else:
parents = [find_parents(tag, self.c14n_branches)[-1]]
Expand All @@ -224,13 +248,9 @@ def _resolve_genres(self, tags):

# c14n only adds allowed genres but we may have had forbidden genres in
# the original tags list
tags = [self._format_tag(x) for x in tags if self._is_allowed(x)]
return [x for x in tags if self._is_valid(x)]

return (
self.config["separator"]
.as_str()
.join(tags[: self.config["count"].get(int)])
)
return tags

def _format_tag(self, tag):
if self.config["title_case"]:
Expand All @@ -242,19 +262,21 @@ def fetch_genre(self, lastfm_obj):
can be found. Ex. 'Electronic, House, Dance'
"""
min_weight = self.config["min_weight"].get(int)
return self._resolve_genres(self._tags_for(lastfm_obj, min_weight))
return self._tags_for(lastfm_obj, min_weight)

def _is_allowed(self, genre):
"""Determine whether the genre is present in the whitelist,
returning a boolean.
def _is_valid(self, genre) -> bool:
"""Check if the genre is valid.
Depending on the whitelist property, valid means a genre is in the
whitelist or any genre is allowed.
"""
if genre is None:
return False
if not self.whitelist or genre in self.whitelist:
return True
return False
return (
genre is None
or not self.whitelist
or genre.lower() in self.whitelist
)

# Cached entity lookups.
# Cached last.fm entity lookups.

def _last_lookup(self, entity, method, *args):
"""Get a genre based on the named entity using the callable `method`
Expand All @@ -270,7 +292,7 @@ def _last_lookup(self, entity, method, *args):

key = "{}.{}".format(entity, "-".join(str(a) for a in args))
if key in self._genre_cache:
return self._genre_cache[key]
result = self._genre_cache[key]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify this entire conditional a fair bit

        if key not in self._genre_cache:
            args = [a.replace("\u2010", "-") for a in args]
            genre = self.fetch_genre(method(*args))
            self._genre_cache[key] = genre

        return self._genre_cache[key]

Note that the REPLACE constant has just a single item, so you can replace this entire logic with a single replacement.

else:
args_replaced = []
for arg in args:
Expand All @@ -280,7 +302,8 @@ def _last_lookup(self, entity, method, *args):

genre = self.fetch_genre(method(*args_replaced))
self._genre_cache[key] = genre
return genre
result = genre
return result

def fetch_album_genre(self, obj):
"""Return the album genre for this Item or Album."""
Expand All @@ -302,42 +325,119 @@ def fetch_track_genre(self, obj):
"track", LASTFM.get_track, obj.artist, obj.title
)

# Main processing: _get_genre() and helpers.

def _get_existing_genres(self, obj, separator):
"""Return a list of genres for this Item or Album."""
if isinstance(obj, library.Item):
item_genre = obj.get("genre", with_album=False).split(separator)
else:
item_genre = obj.get("genre").split(separator)

if any(item_genre):
return item_genre
return []

def _dedup_genres(self, genres, whitelist_only=False):
"""Return a list of deduplicated genres. Depending on the
whitelist_only option, gives filtered or unfiltered results.
Makes sure genres are handled all lower case."""
if whitelist_only:
return deduplicate(
[g.lower() for g in genres if self._is_valid(g)]
)
return deduplicate([g.lower() for g in genres])

def _combine_and_label_genres(
self, new_genres: list, keep_genres: list, log_label: str
) -> tuple:
"""Combines genres and returns them with a logging label.
Parameters:
new_genres (list): The new genre result to process.
keep_genres (list): Existing genres to combine with new ones
log_label (str): A label (like "track", "album") we possibly
combine with a prefix. For example resulting in something like
"keep + track" or just "track".
Returns:
tuple: A tuple containing the combined genre string and the
'logging label'.
"""
self._log.debug(f"fetched last.fm tags: {new_genres}")
combined = deduplicate(keep_genres + new_genres)
Copy link
Member

@snejus snejus Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove deduplicate function from this module and use unique_list from beets.util.__init__

from beets.util import unique_list

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is deduplication performed here? I can see that this is also done in the next step, _resolve_genres implementation (line 243):

            tags = tags_all

        tags = deduplicate(tags)

        # Sort the tags by specificity.

You probably want to keep just one of these.

resolved = self._resolve_genres(combined)
reduced = self._to_delimited_genre_string(resolved)
Comment on lines +369 to +370
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add static types to both of these functions? Once you do you will see an issue here: if _resolve_genres receives an empty list, it returns None. If you feed None into _to_delimited_genre_string, it will throw TypeError since None cannot be indexed.

This needs fixing.


if new_genres and keep_genres:
return reduced, f"keep + {log_label}"
if new_genres:
return reduced, log_label
return None, log_label
Comment on lines +351 to +376
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) As far as I can see,

  • keep is prepended to the label only when both new_genres and keep_genres are not empty
  • neither of these variables changes in this function

This means that finding out whether to prepend keep to the label should happen somewhere outside this function. Thus it can simplified

Suggested change
def _combine_and_label_genres(
self, new_genres: list, keep_genres: list, log_label: str
) -> tuple:
"""Combines genres and returns them with a logging label.
Parameters:
new_genres (list): The new genre result to process.
keep_genres (list): Existing genres to combine with new ones
log_label (str): A label (like "track", "album") we possibly
combine with a prefix. For example resulting in something like
"keep + track" or just "track".
Returns:
tuple: A tuple containing the combined genre string and the
'logging label'.
"""
self._log.debug(f"fetched last.fm tags: {new_genres}")
combined = deduplicate(keep_genres + new_genres)
resolved = self._resolve_genres(combined)
reduced = self._to_delimited_genre_string(resolved)
if new_genres and keep_genres:
return reduced, f"keep + {log_label}"
if new_genres:
return reduced, log_label
return None, log_label
def _combine_genres(self, old: list[str], new: list[str]) -> str | None:
"""Combine old and new genres."""
self._log.debug(f"fetched last.fm tags: {new}")
combined = unique_list(old + new)
resolved = self._resolve_genres(combined)
return self._to_delimited_genre_string(resolved) or None

Note that I added types and removed the docstring which is redundant given appropriately named variables and their types in the function definition.


def _get_genre(self, obj):
"""Get the genre string for an Album or Item object based on
self.sources. Return a `(genre, source)` pair. The
prioritization order is:
"""Get the final genre string for an Album or Item object
`self.sources` specifies allowed genre sources, prioritized as follows:
- track (for Items only)
- album
- artist
- original
- fallback
- None
Parameters:
obj: Either an Album or Item object.
Returns:
tuple: A `(genre, label)` pair, where `label` is a string used for
logging that describes the result. For example, "keep + artist"
indicates that existing genres were combined with new last.fm
genres, while "artist" means only new last.fm genres are
included.
"""

# Shortcut to existing genre if not forcing.
if not self.config["force"] and self._is_allowed(obj.genre):
return obj.genre, "keep"
separator = self.config["separator"].get()
keep_genres = []

genres = self._get_existing_genres(obj, separator)
if genres and not self.config["force"]:
# Without force we don't touch pre-populated tags and return early
# with the original contents. We format back to string tough.
keep_genres = self._dedup_genres(genres)
return separator.join(keep_genres), "keep"

if self.config["force"]:
# Simply forcing doesn't keep any.
keep_genres = []
# keep_existing remembers, according to the whitelist setting,
# any or just allowed genres.
if self.config["keep_existing"] and self.config["whitelist"]:
keep_genres = self._dedup_genres(genres, whitelist_only=True)
elif self.config["keep_existing"]:
keep_genres = self._dedup_genres(genres)
Comment on lines +415 to +418
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does _dedup_genres need to be called on these genres here? They will be sent to _combine_genres -> _resolve_genres later on in this function which will achieve the same thing, I think?


# Track genre (for Items only).
if isinstance(obj, library.Item):
if "track" in self.sources:
result = self.fetch_track_genre(obj)
if result:
return result, "track"
if isinstance(obj, library.Item) and "track" in self.sources:
if new_genres := self.fetch_track_genre(obj):
return self._combine_and_label_genres(
new_genres, keep_genres, "track"
)
Comment on lines +421 to +425
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(2) Given my comment above marked as (1), we need to somehow check whether both new_genres and keep_genres are not empty.

I'd suggest to call self._combine(new_genres, keep_genres) after this entire conditional logic.

        if (
            isinstance(obj, library.Item)
            and "track" in self.sources
            and (new_genres := self.fetch_track_genre(obj))
        ):
            label = "track"
        elif "album" in self.sources and (
            new_genres := self.fetch_album_genre(obj)
        ):
            label = "album"
        elif "artist" in self.sources:
            new_genres = ...
            if new_genres:
                label = "artist"

        if new_genres:
            if keep_genres:
                label = f"keep + {label}"
            return self._combine_genres(new_genres, keep_genres), label


# Album genre.
if "album" in self.sources:
result = self.fetch_album_genre(obj)
if result:
return result, "album"
if new_genres := self.fetch_album_genre(obj):
return self._combine_and_label_genres(
new_genres, keep_genres, "album"
)

# Artist (or album artist) genre.
if "artist" in self.sources:
result = None
new_genres = None
if isinstance(obj, library.Item):
result = self.fetch_artist_genre(obj)
new_genres = self.fetch_artist_genre(obj)
elif obj.albumartist != config["va_name"].as_str():
result = self.fetch_album_artist_genre(obj)
new_genres = self.fetch_album_artist_genre(obj)
else:
# For "Various Artists", pick the most popular track genre.
item_genres = []
Expand All @@ -348,26 +448,34 @@ def _get_genre(self, obj):
if not item_genre:
item_genre = self.fetch_artist_genre(item)
if item_genre:
item_genres.append(item_genre)
item_genres += item_genre
if item_genres:
result, _ = plurality(item_genres)
most_popular, rank = plurality(item_genres)
new_genres = [most_popular]
self._log.debug(
'Most popular track genre "{}" ({}) for VA album.',
most_popular,
rank,
)

if result:
return result, "artist"
if new_genres:
return self._combine_and_label_genres(
new_genres, keep_genres, "artist"
)

# Filter the existing genre.
# Didn't find anything, leave original
if obj.genre:
result = self._resolve_genres([obj.genre])
if result:
return result, "original"
return obj.genre, "original fallback"

# Fallback string.
fallback = self.config["fallback"].get()
if fallback:
# No original, return fallback string
if fallback := self.config["fallback"].get():
return fallback, "fallback"

# No fallback configured
return None, None

# Beets plugin hooks and CLI.

def commands(self):
lastgenre_cmd = ui.Subcommand("lastgenre", help="fetch genres")
lastgenre_cmd.parser.add_option(
Expand All @@ -377,6 +485,20 @@ def commands(self):
action="store_true",
help="re-download genre when already present",
)
lastgenre_cmd.parser.add_option(
"-k",
"--keep-existing",
dest="keep_existing",
action="store_true",
help="keep already present genres",
)
lastgenre_cmd.parser.add_option(
"-K",
"--keep-none",
dest="keep_existing",
action="store_false",
help="don't keep already present genres",
)
lastgenre_cmd.parser.add_option(
"-s",
"--source",
Expand Down
4 changes: 4 additions & 0 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ Bug fixes:
request their own last.fm genre. Also log messages regarding what's been
tagged are now more polished.
:bug:`5582`
* :doc:`plugins/lastgenre`: The new configuration option, ``keep_existing``,
provides more fine-grained control over how pre-populated genre tags are
handled. The ``force`` option now behaves in a more conventional manner.
:bug:`4982`

For packagers:

Expand Down
33 changes: 30 additions & 3 deletions docs/plugins/lastgenre.rst
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,27 @@ Last.fm returns both of those tags, lastgenre is going to use the most
popular, which is often the most generic (in this case ``folk``). By setting
``prefer_specific`` to true, lastgenre would use ``americana`` instead.

Handling pre-populated tags
^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``force``, ``keep_existing`` and ``whitelist`` options control how
pre-existing genres are handled.

By default, the plugin *combines* newly fetched last.fm genres with whitelisted
pre-existing ones (``force: yes`` and ``keep_existing: yes``).

To write new genres to empty tags only and keep pre-populated tags untouched,
set ``force: no``.

To *overwrite* any content of pre-populated tags, set ``force: yes`` and
``keep_existing: no``.

To *combine* newly fetched last.fm genres with any pre-existing ones, set
``force: yes``, ``keep_existing: yes`` and ``whitelist: False``.
Copy link
Member

@snejus snejus Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this documentation too confusing.

Have you considered using a single option, say mode, instead? I'd imagine something like

mode: # keep | combine | overwrite


Combining ``force: no`` and ``keep_existing: yes`` is invalid (since ``force:
no`` means `not touching` existing tags anyway).

Configuration
-------------

Expand All @@ -128,9 +149,15 @@ configuration file. The available options are:
- **fallback**: A string if to use a fallback genre when no genre is found.
You can use the empty string ``''`` to reset the genre.
Default: None.
- **force**: By default, beets will always fetch new genres, even if the files
already have one. To instead leave genres in place in when they pass the
whitelist, set the ``force`` option to ``no``.
- **force**: By default, lastgenre will fetch new genres for empty as well as
pre-populated tags. Enable the ``keep_existing`` option to combine existing
and new genres. (see `Handling pre-populated tags`_).
Default: ``no``.
- **keep_existing**: By default, genres remain in pre-populated tags. Depending
on whether or not ``whitelist`` is enabled, existing genres get "a cleanup".
Enabling ``keep_existing`` is only valid in combination with an active
``force`` option. To ensure only fresh last.fm genres, disable this option.
(see `Handling pre-populated tags`_)
Default: ``yes``.
- **min_weight**: Minimum popularity factor below which genres are discarded.
Default: 10.
Expand Down
Loading
Loading