-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add musicbrainz id option to importer #1808
Conversation
* Add '-m', '--musicbrainzid' option to the import command, allowing the user to specify a single MusicBrainz ID to be used for the candidate lookup instead of trying to find suitable candidates. * Modify lookup_candidates() of ImportTask and SingletonImportTask to use the musicbrainz id if present.
Fantastic. Thanks for getting started on this. On where the feature should "live": What you've done here is definitely the most straightforward change we can make, which is a good thing. And since the intended effect of the The other option that seems interesting to me is a parameter on the Multiple MBIDs seem like a natural extension too! It would also apply if, for example, you know you have one of a few different albums but aren't quite sure which. |
* Modify the "--musicbrainzid" argument to the importer so multiple IDs can be specified by the user instead of a single one. * Revise autotag.match.tag_album and autotag.match.tag_item signature to expect a list of IDs (search_ids) instead of a single one (search_id), and add logic for handling and returning multiple matches for those IDs. * Update calls to those functions in other parts of the code.
* Add tests for the "--musicbrainzid" argument (one/several ids for matching an album/singleton; direct test on task.lookup_candidates() for album/singleton).
Thanks for the input, as usual! Slow days for me, which means this pull requests continues to be an unpolished work in progress, but I have made an attempt to add the option for specifying multiple MusicBrainz IDs instead of a single one, plus some basic tests. I resorted to modifying the signatures of A note as well on the tests: at the moment they query MusicBrainz several times with very similar queries (and one of them is failing on Travis due to the order of the arguments it seems), as at this point they are mainly intended to be an aid during development. I do intend to clean them up before merging so they mock the calls to MusicBrainz using a solution inspired on the |
@@ -787,7 +787,7 @@ def choose_item(self, task): | |||
search_id = manual_id(True) | |||
if search_id: | |||
candidates, rec = autotag.tag_item(task.item, | |||
search_id=search_id) | |||
search_ids=[search_id]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A related side effect (provided the current implementation of multiple IDs stays more or less the same) is that it would take almost no effort (mainly a matter of splitting search_id
) to allow the user to specify several IDs during the import when selecting the enter Id
prompt choice. Would that be a good idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm for it!
Edit: But, I should add, I don't think it's critical—we could come back around to this later, of course.
* Replace the entities used on ImportMusicBrainzIdTest mocking the calls to musicbrainzngs.get_release_by_id and musicbrainzngs.get_recording_by_id instead of querying MusicBrainz. * Other cleanup and docstring fixes.
Fixed the tests so they mock the queries to MusicBrainz, and tidied them up in general. Travis is still failing on a test with the singletons that I can't reproduce on my environment (yet). |
* Fix an issue that caused the candidates for a singleton not to be returned ordered by distance from autotag.match.tag_item(), when searching multiple MusicBrainz ids (ie. several "--musicbrainzid" arguments). The candidates are now explicitely reordered before being returned and before the recommendation is computed. * Fix test_importer.mocked_get_recording_by_id so that the artist is nested properly (and as a result, taken into account into the distance calculations).
This sounds perfect to me. It's a generalization of the old functionality to match the new feature. Wonderful! |
Also:
That is certainly weird—I also note that this seems to be failing in the py27 environment but not the pypy one. That certainly shouldn't happen… I also tried running the tests on your branch locally and couldn't trigger the failure. Seriously weird—any chance this is some crazy, random fluke? |
@@ -370,7 +370,7 @@ def _add_candidate(items, results, info): | |||
|
|||
|
|||
def tag_album(items, search_artist=None, search_album=None, | |||
search_id=None): | |||
search_ids=[]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably describe search_ids
in the docstring. (Although I note we didn't document search_id
before—bad @sampsyo!)
Great! I tend to be a bit hesitant when modifying function signatures and the likes, but I'm glad it is looking good.
I have just pushed a commit after further investigating the failing test: it seems that I introduced a bug that caused the candidates for a singleton not to be guaranteed to be returned ordered by their distance, and would probably have gone unnoticed if it were not for the weird fail - my apologies. I'm guessing the lucky part was that the candidates dict "order" got reversed on that particular environment, or something similar? |
Aha! That actually makes sense. I think it's due to Python's newish random hash function, which causes dictionaries to iterate in different orders on every run. http://stackoverflow.com/questions/27522626/hash-function-in-python-3-3-returns-different-results-between-sessions |
* Style cleanup and fixes for the "--musicbrainzid" import argument. * Allow the input of several IDs (separated by spaces) on the "enter Id" importer prompt. * Add basic documentation.
I'm glad the mystery has been sorted out, I was rather puzzled myself for a while! Thanks as well for the review and the notes, I have hopefully addressed the issues on the latest commit and also added some basic documentation. As for the ability of specifying multiple IDs when using the interactive import Next step will be an attempt to make A question I came across while revising documentation and other issues: I have personally never used the discogs backend and unfortunately could not locate any discogs-specific unit tests, so could you let me know if you envisage any conflicts when using discogs instead of musicbrainz, or if there are some manual tests that need to be performed? There seems to be some potential interference between these changes and the features provided by the |
* Store the user-supplied MusicBrainz IDs (via the "--musicbrainzid" importer argument) on ImporTask.task.musicbrainz_ids during the lookup_candidates() pipeline stage. * Update test cases to reflect the changes.
4eedd2b contains the aforementioned attempt to make the feature a parameter of the A more correct way of making these changes useful to the eventual "MusicBrainzID <-> task" mapper feature/mechanism would probably involve creating another pipeline stage, or similar changes to the As usual, I'd love to hear your thoughts, and make any further changes as desired! |
This should be OK. Since the ID lookups are hidden behind the |
|
||
# Restrict the initial lookup to IDs specified by the user via the -m | ||
# option. Currently all the IDs are passed onto the tasks directly. | ||
task.musicbrainz_ids = session.config['musicbrainz_ids'].as_str_seq() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great for now. As you note in your other comment, we can move this around later if we want a more sophisticated way of populating this.
* Rename the importer argument and related variables to make it more generic, as the feature should be independent of the backend used and not restricted to MusicBrainz. * Update documentation and docstrings accordingly. * Add changelog entry.
Sounds like a more than fair nitpick, and indeed derived from my initial "MusicBrainz-centric" approach. I went with the I also took the chance to modify the documentation and docstrings, as well as rename the variables used, in order to make it clearer that the feature should be backend-agnostic. For the docs, it basically ended up being a I have also added an entry to the changelog, in the hopes of having the pull request ready for the final review! |
@@ -132,6 +132,11 @@ Optional command flags: | |||
option. If set, beets will just print a list of files that it would | |||
otherwise import. | |||
|
|||
* If you already have a metadata backend ID that matches the items to be | |||
imported, you can instruct beets to restrict the search to that ID instead of | |||
searching for other candidates by using the ``--search_id SEARCH_ID`` option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Itsy-bitsy typo: --search-id
, not --search_id
.
Fantastic! This all looks perfect. I found one incredibly tiny typo, but this is otherwise 100% ready. Feel free to hit the big green button. 🎉 This is a surprisingly frequent request, so I'm thrilled to have this implemented! |
Thanks for double-checking, as usual, and my apologies for the careless typo! I'm also glad it looks like the feature will be put to good use 👍 I was planning on merging when the travis build was completed, but seems like it failed due some network problems / truncated download of sorts - I'm not sure if there is a way to relaunch the job and get rid of the little red "x" next to the commit. Strange luck with the builds on this pull requests, it seems! |
Bad luck indeed. I have the power to restart a build -- I'll see if I can extend that to GitHub collaborators too. |
I didn't figure that out, but rerunning the build did work! ✨ Merging now. |
Add musicbrainz id option to importer
Work in progress for solving the recently-revived issue #170, as I have also been bitten by the long time it takes to import extremely large albums and matching against all candidates - hopefully this can be a way of alleviating the amount of time needed in some scenarios.
I'm mainly opening the pull request at this early stage in order to discuss the level where the use of the ID should be handled. Currently it is done inside
Task.lookup_candidates()
, but it feels a bit out of place to useconfig['import']['musicbrainz_id']
there, and might convolute a bit readability and isolation. I'm wondering if this would be better moved to other place, such as:lookup_candidates()
pipeline stage?Task
constructor from theImportTaskFactory
?I have also toyed with the idea of allowing several IDs to be specified (probably by using the
-m
option several times) in order to have a "batch mode" of sorts, but it might end up causing the opposite effect (ie. performing the matching against all the user-specified ids for every album). However, if we can assume that the user is highly confident that the items are reasonably clean, perhaps we can come up with some "light and quick" way of discarding all but one ID for the current item (for example, simply by counting the files, or using the file names somehow, etc). Might be a long shot, but I'd love to hear some opinions.Acknowledged TODOs: revise docstrings, revise UI strings and names, add tests, write documentation.