-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lyrics: Refactor Genius, Google backends, and consolidate common functionality #5474
base: fix-lrclib-lyrics
Are you sure you want to change the base?
Commits on Nov 22, 2024
-
Apply dist_thresh to Genius and Google backends
This commit introduces a distance threshold mechanism for the Genius and Google backends. - Create a new `SearchBackend` base class with a method `check_match` that performs checking. - Start using undocumented `dist_thresh` configuration option for good, and mention it in the docs. This controls the maximum allowable distance for matching artist and title names. These changes aim to improve the accuracy of lyrics matching, especially when there are slight variations in artist or title names, see #4791.
Configuration menu - View commit details
-
Copy full SHA for 70db4ee - Browse repository at this point
Copy the full SHA 70db4eeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5af8f0d - Browse repository at this point
Copy the full SHA 5af8f0dView commit details -
Centralize requests setup with requests.Session
Improve requests performance with requests.Session which uses connection pooling for repeated requests to the same host. Additionally, this centralizes request configuration, making sure that we use the same timeout and provide beets user agent for all requests.
Configuration menu - View commit details
-
Copy full SHA for ad53e8d - Browse repository at this point
Copy the full SHA ad53e8dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 16eada1 - Browse repository at this point
Copy the full SHA 16eada1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9a26025 - Browse repository at this point
Copy the full SHA 9a26025View commit details -
Configuration menu - View commit details
-
Copy full SHA for 53b5b19 - Browse repository at this point
Copy the full SHA 53b5b19View commit details -
Do not try to strip cruft from the parsed lyrics text.
Having removed it I fuond that only the Genius lyrics changed: it had en extra new line. Thus I defined a function 'collapse_newlines' which now gets called for the Genius lyrics.
Configuration menu - View commit details
-
Copy full SHA for aa3a1c5 - Browse repository at this point
Copy the full SHA aa3a1c5View commit details -
Use a single slug implementation
Tidy up 'Google.is_page_candidate' method and remove 'Google.sluggify' method which was a duplicate of 'slug'. Since 'GeniusFetchTest' only tested whether the artist name is cleaned up (the rest of the functionality is patched), remove it and move its test cases to the 'test_slug' test.
Configuration menu - View commit details
-
Copy full SHA for d3aeed2 - Browse repository at this point
Copy the full SHA d3aeed2View commit details -
Configuration menu - View commit details
-
Copy full SHA for f7df3fb - Browse repository at this point
Copy the full SHA f7df3fbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1b9aa3b - Browse repository at this point
Copy the full SHA 1b9aa3bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4a33cc3 - Browse repository at this point
Copy the full SHA 4a33cc3View commit details -
Configuration menu - View commit details
-
Copy full SHA for cf2aede - Browse repository at this point
Copy the full SHA cf2aedeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9ce662b - Browse repository at this point
Copy the full SHA 9ce662bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 61e4142 - Browse repository at this point
Copy the full SHA 61e4142View commit details -
* Type the response data that Google Custom Search API return. * Exclude some 'letras.mus.br' pages that do not contain lyric. * Exclude results from Musixmatch as we cannot access their pages. * Improve parsing of the URL title: - Handle long URL titles that get truncated (end with ellipsis) for long searches - Remove domains starting with 'www' - Parse the title AND the artist. Previously this would only parse the title, and fetch lyrics even when the artist did not match. * Remove now redundant credits cleanup and checks for valid lyrics.
Configuration menu - View commit details
-
Copy full SHA for 99f3c67 - Browse repository at this point
Copy the full SHA 99f3c67View commit details -
Create Html class for cleaning up the html text
Additionally, improve HTML pre-processing: * Ensure a new line between blocks of lyrics text from letras.mus.br. * Parse a missing last block of lyrics text from lacocinelle.net. * Parse a missing last block of lyrics text from paroles.net. * Fix encoding issues with AZLyrics by setting response encoding to None, allowing `requests` to handle it.
Configuration menu - View commit details
-
Copy full SHA for 7d35c4c - Browse repository at this point
Copy the full SHA 7d35c4cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2ce3857 - Browse repository at this point
Copy the full SHA 2ce3857View commit details -
Google: make sure we do not return the captcha text
If we get caught by Cloudfare, it forwards our request somewhere else and returns some validation text response. To make sure that this text does not get assumed for lyrics, we can disable redirects for the Google backend, check the response code and raise if there's a redirect attempt. This source will then be skipped and the backend continues with the next one.
Configuration menu - View commit details
-
Copy full SHA for a36eaee - Browse repository at this point
Copy the full SHA a36eaeeView commit details -
Remove dependency existence checks
I think we can make our life easier by removing these checks assuming that users follow the instructions in the docs.
Configuration menu - View commit details
-
Copy full SHA for c7d6750 - Browse repository at this point
Copy the full SHA c7d6750View commit details -
Configuration menu - View commit details
-
Copy full SHA for db84aae - Browse repository at this point
Copy the full SHA db84aaeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 71895b2 - Browse repository at this point
Copy the full SHA 71895b2View commit details -
Configuration menu - View commit details
-
Copy full SHA for bfe4589 - Browse repository at this point
Copy the full SHA bfe4589View commit details -
Configuration menu - View commit details
-
Copy full SHA for ebf136f - Browse repository at this point
Copy the full SHA ebf136fView commit details