Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lrclib lyrics #5406

Open
wants to merge 6 commits into
base: lyrics-refactor-tests
Choose a base branch
from
Open

Fix lrclib lyrics #5406

wants to merge 6 commits into from

Conversation

snejus
Copy link
Member

@snejus snejus commented Sep 4, 2024

Fixes #5102

LRCLib lyrics backend fixes

Bug Fixes

  • Fixed fetching lyrics from lrclib source. If lyrics for a specific album, artist, and title combination are not found, the plugin now searches for the artist and title and picks the most relevant result, scoring them by
    1. Duration similarity to the target item
    2. Availability of synced lyrics
  • Updated the default sources configuration to prioritize lrclib over other sources for faster and more reliable results.

Code Improvements

  • Added type annotations to fetch method in all backends.
  • Introduced LRCLyrics and LRCLibItem classes to encapsulate lyrics data and improve code structure.
  • Enhanced error handling and logging enchancements to the LRCLib backend. These will be added to the rest of the backends in a separate PR.

Tests

  • Added new tests to cover the updated functionality and error handling scenarios.

To Do

  • Documentation. (If you've added a new command-line flag, for example, find the appropriate page under docs/ to describe it.)
  • Changelog. (Add an entry to docs/changelog.rst to the bottom of one of the lists near the top of the document.)
  • Tests. (Very much encouraged but not strictly required.)

@snejus snejus self-assigned this Sep 4, 2024
@snejus snejus linked an issue Sep 4, 2024 that may be closed by this pull request
@snejus snejus requested a review from bal-e September 4, 2024 04:27
@snejus snejus force-pushed the fix-lrclib-lyrics branch 2 times, most recently from d4bed72 to 829192d Compare September 4, 2024 04:40
@snejus snejus requested a review from Serene-Arc September 4, 2024 04:47
@snejus snejus force-pushed the fix-lrclib-lyrics branch 5 times, most recently from cb8929f to c2807f0 Compare September 4, 2024 10:26
@snejus
Copy link
Member Author

snejus commented Sep 4, 2024

The build on win32 is failing to install reflink because it's only supported until Python 3.7.

I will address this in a separate PR and rebase this one accordingly once the fix is merged.

Note: this issue popped up now because I added a new requests-mock dependency which invalidated cached dependencies.

Copy link
Member

@bal-e bal-e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a much better implementation now, well done. I especially like that you've managed to remove the test resources -- I wasn't expecting 2000 lines of lyrics in there.

beetsplug/lyrics.py Outdated Show resolved Hide resolved
beetsplug/lyrics.py Outdated Show resolved Hide resolved
snejus added a commit that referenced this pull request Sep 8, 2024
…5407)

See my comment under #5406 for context

> The build on win32 is failing to install reflink because it's [only
supported until Python
3.7](https://gitlab.com/rubdos/pyreflink/-/blob/master/setup.py?ref_type=heads).
>
> I will address this in a separate PR and rebase this one accordingly
once the fix is merged.
>
> Note: this issue popped up now because I added a new requests-mock
dependency which invalidated cached dependencies.
@snejus snejus requested a review from bal-e September 11, 2024 09:28
@snejus snejus force-pushed the fix-lrclib-lyrics branch 3 times, most recently from dc02c94 to 622ed3c Compare September 11, 2024 11:21
@snejus snejus requested a review from JOJ0 September 11, 2024 11:40
beetsplug/lyrics.py Outdated Show resolved Hide resolved
beetsplug/lyrics.py Outdated Show resolved Hide resolved
beetsplug/lyrics.py Outdated Show resolved Hide resolved
@tranxuanthang
Copy link

Hey, LRCLIB author here 👋. It would be great if you could make the /search API a fallback for the /get API (instead of replacing it entirely) when no result is found. The /get API is more performant on LRCLIB's side, whereas the /search API is much slower.

Also, the /get API has been updated recently, and the album_name parameter is no longer a hard requirement.

If you have any ideas for further improvements to LRCLIB's API, feel free to let me know!

@snejus
Copy link
Member Author

snejus commented Sep 26, 2024

Hey, LRCLIB author here 👋. It would be great if you could make the /search API a fallback for the /get API (instead of replacing it entirely) when no result is found. The /get API is more performant on LRCLIB's side, whereas the /search API is much slower.

Also, the /get API has been updated recently, and the album_name parameter is no longer a hard requirement.

If you have any ideas for further improvements to LRCLIB's API, feel free to let me know!

Hi @tranxuanthang, thanks for popping in! Absolutely, that's no problem at all. This should make most lyrics queries even speedier on our side.

Now that we're on this topic, I think it may be a good idea to also add caching: for example, if we're getting lyrics for two separate files

  1. Artist - Title (Some Remix)
  2. Artist - Title

Ideally we should only ask for Artist - Title lyrics once when Artist - Title (Some Remix) is not found.

@tranxuanthang thanks for a reliable and performant API!

@edgars-supe
Copy link
Contributor

@snejus Will this PR also address the issue that I attempted to fix in #5089? Looks like quite a refactor, so I can't quite tell (I'm not much of a Pythoneer).

@snejus
Copy link
Member Author

snejus commented Dec 7, 2024

Snap, I wish I'd seen your PR earlier!

It does, however I'm not sure when will it get reviewed and merged in, so let's get your PR merged in first!

Also note that the lyrics tests / configuration issue you experienced while working on that PR has been fixed here: #5452

@snejus snejus force-pushed the lyrics-refactor-tests branch from 4970dda to 09863b8 Compare December 7, 2024 16:56
Adjust the base URL to perform a '/search' instead of attempting to
'/get' specific lyrics where we're unlikely to find lyrics for the
specific combination of album, artist, track names and the duration (see
https://lrclib.net/docs).

Since we receive an array of matching lyrics candidates, rank them by
their duration similarity to the item's duration, and whether they
contain synced lyrics.
@snejus snejus force-pushed the fix-lrclib-lyrics branch from 79b2af5 to b879e9c Compare December 7, 2024 16:57
@edgars-supe
Copy link
Contributor

Okay, sounds like a plan to me!

This is probably more suitable for your other PR (#5474), but I'll mention this here. In the comments of my PR, I noted that there's opportunity for a bigger change regarding synced lyrics and fallbacks. If I recall correctly, with how things currently are, it's first come, first serve with regards to choosing lyrics from the enabled backends. In other words, if a plain-only backend fetches lyrics before we try a backend that supports synced lyrics, the plain ones will be used. That's not great if we prefer synced lyrics.

The idea for solving this that I had was:

  • if synced lyrics are preferred, always query synced-supporting backends first;
  • query the backends in order of preference (the order they're defined in sources in the config, but synced ones first if synced: yes);
  • if a synced source returns synced lyrics, use those;
  • if a synced source returns plain lyrics, check other synced sources;
  • if there are no synced lyrics, just use plain ones.

Is anything like this in your plans? Thoughts on it in general? Seeing how you're working a lot on this plugin, I think you'd be well suited to tackle this.

@snejus
Copy link
Member Author

snejus commented Dec 7, 2024

Okay, sounds like a plan to me!

This is probably more suitable for your other PR (#5474), but I'll mention this here. In the comments of my PR, I noted that there's opportunity for a bigger change regarding synced lyrics and fallbacks. If I recall correctly, with how things currently are, it's first come, first serve with regards to choosing lyrics from the enabled backends. In other words, if a plain-only backend fetches lyrics before we try a backend that supports synced lyrics, the plain ones will be used. That's not great if we prefer synced lyrics.

The idea for solving this that I had was:

  • if synced lyrics are preferred, always query synced-supporting backends first;
  • query the backends in order of preference (the order they're defined in sources in the config, but synced ones first if synced: yes);
  • if a synced source returns synced lyrics, use those;
  • if a synced source returns plain lyrics, check other synced sources;
  • if there are no synced lyrics, just use plain ones.

Is anything like this in your plans? Thoughts on it in general? Seeing how you're working a lot on this plugin, I think you'd be well suited to tackle this.

I like the concept of this! Though we need to be realistic here: lrclib is the only provider that supports synced lyrics 😅

Well at least this simplifies engineering a solution, since we never need to jump between the backends.

Now if we zoom into LRCLib, you will find that the results are currently scored this way:

    @cached_property
    def dist(self) -> tuple[bool, float]:
        """Distance/score of the given lyrics item.

        Return a tuple with the following values:
        1. Absolute difference between lyrics and target duration
        2. Boolean telling whether synced lyrics are available.

        Best lyrics match is the one that has the closest duration to
        ``target_duration`` and has synced lyrics available.
        """
        return self.duration_dist, not self.synced

It prioritizes lyrics that have duration closer to the music file. Presence of synced lyrics comes secondary.

Though now that I think of this, I think we may be able to swap these two around:

return not self.synced, self.duration_dist

And thus prioritize results that have synced lyrics available. I guess the danger is what if we receive some wrong synced lyrics where the duration is vastly different?.

Results that have mismatching durations actually get ignored early on as the algorithm is fairly strict about what's allowed: it tolerates ±5% difference:

    DURATION_DIFF_TOLERANCE = 0.05

    @cached_property
    def is_valid(self) -> bool:
        """Return whether the lyrics item is valid.
        Lyrics duration must be within the tolerance defined by
        :attr:`DURATION_DIFF_TOLERANCE`.
        """
        return (
            abs(self.duration - self.target_duration)
            <= self.target_duration * self.DURATION_DIFF_TOLERANCE
        )

This means that when we calculate dist above, the result duration is guaranteed to be
acceptable - so we may be able to get away with prioritising results that have synced lyrics from that point on 😉

@snejus
Copy link
Member Author

snejus commented Dec 8, 2024

From what I've seen so far the change above works very well:

image

I'm also seeing that this fixes some of the incorrect lyrics that the LRCLib /get endpoint returned.

@edgars-supe
Copy link
Contributor

I like the concept of this! Though we need to be realistic here: lrclib is the only provider that supports synced lyrics 😅

So far. 🙃

Regarding the algorithm, I think if you first disqualify matches that are too short/long and then rank by synced-ness, that makes sense. That's all good.

The issue I'm thinking of is more for when there are multiple synced lyric providers. Imagine there's LCRLib and bilRCL, both can provide synced lyrics. As it is now, if I understand it correctly, if LRCLib is first in the sources list and only has plain lyrics for a song, the plugin will apply the plain lyrics, despite the fact that bilRCL might have synced lyrics. What I'm suggesting is to have the approach you have made here for LCRLib and expand it for all backends.

  • If you want synced lyrics, and the first source that supports synced lyrics has them, use them (in all cases - if they match the duration and other parameters).
  • If the first source only has plain lyrics, check the next source that supports lyrics.
  • If no synced source has synced lyrics, use the plain lyrics from the first source.
  • Sources are checked in order of preference as specified by the order of the sources config field.

@snejus
Copy link
Member Author

snejus commented Dec 8, 2024

The issue I'm thinking of is more for when there are multiple synced lyric providers. Imagine there's LCRLib and bilRCL, both can provide synced lyrics. As it is now, if I understand it correctly, if LRCLib is first in the sources list and only has plain lyrics for a song, the plugin will apply the plain lyrics, despite the fact that bilRCL might have synced lyrics.

I see! You're correct, the plugin now picks any kind of lyrics returned by the first source that returns something.

What I'm suggesting is to have the approach you have made here for LCRLib and expand it for all backends.

This makes sense - I am myself after synced lyrics if possible, so if there was another source that provides them, I would indeed try to engineer a way to retrieve them.

  • If you want synced lyrics, and the first source that supports synced lyrics has them, use them (in all cases - if they match the duration and other parameters).
  • If the first source only has plain lyrics, check the next source that supports lyrics.
  • If no synced source has synced lyrics, use the plain lyrics from the first source.
  • Sources are checked in order of preference as specified by the order of the sources config field.

This looks like a good approach. This of course depends on knowing in advance which sources provide synced lyrics, so that we can try them first.

  • If no synced source has synced lyrics, use the plain lyrics from the first source.

I'd say we may even want to have a separate source priority list for non-synced lyrics: for example, I found that I prefer lyrics from LRCLib only if they are synced. Otherwise, I'd rather get them from Genius where they've been reviewed by the community.

Alternatively, if users want synced lyrics, we firstly query backends that provide them (respecting their order in the sources configuration), and then query the rest if synced weren't found. This way, if I have the following configuration:

lyrics:
  sources: [genius, lrclib, tekstowo, bilrcl]
  synced: yes

The plugin tries LRCLib and bilRCL first and returns the first valid synced lyrics, if found. If not, it tries Genius, then (cached) plain lyrics from LRCLib, then Tekstowo, and finally bilRCL.

@edgars-supe
Copy link
Contributor

I think your suggestions make perfect sense! If you were to implement it like that, I wouldn't be mad at all. Though this probably warrants a bigger discussion with more maintainers, I would imagine. At any rate, I'll leave this idea with you, since you're working a lot on this plugin and you're way better suited for the task, if you choose to accept it, than I am.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lyrics: config in tests not reset before each test beets can't fetch lyrics from lrclib.net
4 participants