Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lyrics: Refactor Genius, Google backends, and consolidate common functionality #5474

Open
wants to merge 23 commits into
base: fix-lrclib-lyrics
Choose a base branch
from

Commits on Nov 22, 2024

  1. Apply dist_thresh to Genius and Google backends

    This commit introduces a distance threshold mechanism for the Genius and
    Google backends.
    
    - Create a new `SearchBackend` base class with a method `check_match`
      that performs checking.
    - Start using undocumented `dist_thresh` configuration option for good,
      and mention it in the docs. This controls the maximum allowable
      distance for matching artist and title names.
    
    These changes aim to improve the accuracy of lyrics matching, especially
    when there are slight variations in artist or title names, see #4791.
    snejus committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    70db4ee View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5af8f0d View commit details
    Browse the repository at this point in the history
  3. Centralize requests setup with requests.Session

    Improve requests performance with requests.Session which uses connection
    pooling for repeated requests to the same host.
    
    Additionally, this centralizes request configuration, making sure that
    we use the same timeout and provide beets user agent for all requests.
    snejus committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    ad53e8d View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    16eada1 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    9a26025 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    53b5b19 View commit details
    Browse the repository at this point in the history
  7. Do not try to strip cruft from the parsed lyrics text.

    Having removed it I fuond that only the Genius lyrics changed: it had en
    extra new line. Thus I defined a function 'collapse_newlines' which now
    gets called for the Genius lyrics.
    snejus committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    aa3a1c5 View commit details
    Browse the repository at this point in the history
  8. Use a single slug implementation

    Tidy up 'Google.is_page_candidate' method and remove 'Google.sluggify'
    method which was a duplicate of 'slug'.
    
    Since 'GeniusFetchTest' only tested whether the artist name is cleaned
    up (the rest of the functionality is patched), remove it and move its
    test cases to the 'test_slug' test.
    snejus committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    d3aeed2 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    f7df3fb View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    1b9aa3b View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    4a33cc3 View commit details
    Browse the repository at this point in the history
  12. Remove extract_text_between

    snejus committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    cf2aede View commit details
    Browse the repository at this point in the history
  13. Genius: refactor and simplify

    snejus committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    9ce662b View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    61e4142 View commit details
    Browse the repository at this point in the history
  15. Google: Refactor and improve

    * Type the response data that Google Custom Search API return.
    * Exclude some 'letras.mus.br' pages that do not contain lyric.
    * Exclude results from Musixmatch as we cannot access their pages.
    * Improve parsing of the URL title:
      - Handle long URL titles that get truncated (end with ellipsis) for
        long searches
      - Remove domains starting with 'www'
      - Parse the title AND the artist. Previously this would only parse the
        title, and fetch lyrics even when the artist did not match.
    * Remove now redundant credits cleanup and checks for valid lyrics.
    snejus committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    99f3c67 View commit details
    Browse the repository at this point in the history
  16. Create Html class for cleaning up the html text

    Additionally, improve HTML pre-processing:
    
    * Ensure a new line between blocks of lyrics text from letras.mus.br.
    * Parse a missing last block of lyrics text from lacocinelle.net.
    * Parse a missing last block of lyrics text from paroles.net.
    * Fix encoding issues with AZLyrics by setting response encoding to
      None, allowing `requests` to handle it.
    snejus committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    7d35c4c View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    2ce3857 View commit details
    Browse the repository at this point in the history
  18. Google: make sure we do not return the captcha text

    If we get caught by Cloudfare, it forwards our request somewhere else
    and returns some validation text response. To make sure that this text
    does not get assumed for lyrics, we can disable redirects for the Google
    backend, check the response code and raise if there's a redirect
    attempt. This source will then be skipped and the backend continues with
    the next one.
    snejus committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    a36eaee View commit details
    Browse the repository at this point in the history
  19. Remove dependency existence checks

    I think we can make our life easier by removing these checks assuming
    that users follow the instructions in the docs.
    snejus committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    c7d6750 View commit details
    Browse the repository at this point in the history
  20. Tidy up handling of backends

    snejus committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    db84aae View commit details
    Browse the repository at this point in the history
  21. Append source to the lyrics

    snejus committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    71895b2 View commit details
    Browse the repository at this point in the history
  22. Xfail Songlyrics source

    snejus committed Nov 22, 2024
    Configuration menu
    Copy the full SHA
    bfe4589 View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    ebf136f View commit details
    Browse the repository at this point in the history