Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add musicbrainz id option to importer #1808

Merged
merged 10 commits into from
Jan 22, 2016
Merged

Conversation

diego-plan9
Copy link
Member

Work in progress for solving the recently-revived issue #170, as I have also been bitten by the long time it takes to import extremely large albums and matching against all candidates - hopefully this can be a way of alleviating the amount of time needed in some scenarios.

I'm mainly opening the pull request at this early stage in order to discuss the level where the use of the ID should be handled. Currently it is done inside Task.lookup_candidates(), but it feels a bit out of place to use config['import']['musicbrainz_id'] there, and might convolute a bit readability and isolation. I'm wondering if this would be better moved to other place, such as:

  • the lookup_candidates() pipeline stage?
  • pass it as a parameter to the Task constructor from the ImportTaskFactory?
  • other options?

I have also toyed with the idea of allowing several IDs to be specified (probably by using the -m option several times) in order to have a "batch mode" of sorts, but it might end up causing the opposite effect (ie. performing the matching against all the user-specified ids for every album). However, if we can assume that the user is highly confident that the items are reasonably clean, perhaps we can come up with some "light and quick" way of discarding all but one ID for the current item (for example, simply by counting the files, or using the file names somehow, etc). Might be a long shot, but I'd love to hear some opinions.

Acknowledged TODOs: revise docstrings, revise UI strings and names, add tests, write documentation.

* Add '-m', '--musicbrainzid' option to the import command, allowing the user
to specify a single MusicBrainz ID to be used for the candidate lookup instead
of trying to find suitable candidates.
* Modify lookup_candidates() of ImportTask and SingletonImportTask to use the
musicbrainz id if present.
@sampsyo
Copy link
Member

sampsyo commented Jan 11, 2016

Fantastic. Thanks for getting started on this.

On where the feature should "live": What you've done here is definitely the most straightforward change we can make, which is a good thing. And since the intended effect of the -m option is to globally instruct the importer to use a given MBID, it's not too crazy as a concept, either. So I wouldn't be sad if this was what we did in the end.

The other option that seems interesting to me is a parameter on the Task. This feels like it might afford a bit more flexibility: I don't exactly know what it would look like, but you can imagine designing a feature that would assign a different MBID to each task based on some external guess. Maybe it's worth looking into—we can see how messy it would get.

Multiple MBIDs seem like a natural extension too! It would also apply if, for example, you know you have one of a few different albums but aren't quite sure which.

* Modify the "--musicbrainzid" argument to the importer so multiple IDs can be
specified by the user instead of a single one.
* Revise autotag.match.tag_album and autotag.match.tag_item signature to expect
a list of IDs (search_ids) instead of a single one (search_id), and add logic
for handling and returning multiple matches for those IDs.
* Update calls to those functions in other parts of the code.
* Add tests for the "--musicbrainzid" argument (one/several ids for matching
an album/singleton; direct test on task.lookup_candidates() for
album/singleton).
@diego-plan9
Copy link
Member Author

Thanks for the input, as usual! Slow days for me, which means this pull requests continues to be an unpolished work in progress, but I have made an attempt to add the option for specifying multiple MusicBrainz IDs instead of a single one, plus some basic tests.

I resorted to modifying the signatures of match.tag_album() and match.tag_item() so they expect a list of IDs instead of a single id (search_id -> search_ids), adjusting the internal logic accordingly so it loops through the IDs preserving the matches found. This is a bit more intrusive that I'd like, but the alternatives did not look too appealing to me (keep the signature and add some code for handling single items vs. lists inside the function, or wrap the calls into a light "disambiguation" function of sorts) - I hope it's an acceptable implementation. As a result of modifying the signature, some calls in other places of the code had to be adjusted accordingly - I'll do another round to make sure no unexpected problems occur.

A note as well on the tests: at the moment they query MusicBrainz several times with very similar queries (and one of them is failing on Travis due to the order of the arguments it seems), as at this point they are mainly intended to be an aid during development. I do intend to clean them up before merging so they mock the calls to MusicBrainz using a solution inspired on the AutotagStub or the test_mb mechanisms.

@@ -787,7 +787,7 @@ def choose_item(self, task):
search_id = manual_id(True)
if search_id:
candidates, rec = autotag.tag_item(task.item,
search_id=search_id)
search_ids=[search_id])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A related side effect (provided the current implementation of multiple IDs stays more or less the same) is that it would take almost no effort (mainly a matter of splitting search_id) to allow the user to specify several IDs during the import when selecting the enter Id prompt choice. Would that be a good idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm for it!

Edit: But, I should add, I don't think it's critical—we could come back around to this later, of course.

* Replace the entities used on ImportMusicBrainzIdTest mocking the calls to
musicbrainzngs.get_release_by_id and musicbrainzngs.get_recording_by_id instead
of querying MusicBrainz.
* Other cleanup and docstring fixes.
@diego-plan9
Copy link
Member Author

Fixed the tests so they mock the queries to MusicBrainz, and tidied them up in general. Travis is still failing on a test with the singletons that I can't reproduce on my environment (yet).

* Fix an issue that caused the candidates for a singleton not to be
returned ordered by distance from autotag.match.tag_item(), when
searching multiple MusicBrainz ids (ie. several "--musicbrainzid"
arguments). The candidates are now explicitely reordered before being
returned and before the recommendation is computed.
* Fix test_importer.mocked_get_recording_by_id so that the artist is
nested properly (and as a result, taken into account into the distance
calculations).
@sampsyo
Copy link
Member

sampsyo commented Jan 20, 2016

I resorted to modifying the signatures of match.tag_album() and match.tag_item() so they expect a list of IDs instead of a single id (search_id -> search_ids),

This sounds perfect to me. It's a generalization of the old functionality to match the new feature. Wonderful!

@sampsyo
Copy link
Member

sampsyo commented Jan 20, 2016

Also:

Travis is still failing on a test with the singletons that I can't reproduce on my environment (yet).

That is certainly weird—I also note that this seems to be failing in the py27 environment but not the pypy one. That certainly shouldn't happen…

I also tried running the tests on your branch locally and couldn't trigger the failure. Seriously weird—any chance this is some crazy, random fluke?

@@ -370,7 +370,7 @@ def _add_candidate(items, results, info):


def tag_album(items, search_artist=None, search_album=None,
search_id=None):
search_ids=[]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably describe search_ids in the docstring. (Although I note we didn't document search_id before—bad @sampsyo!)

@diego-plan9
Copy link
Member Author

This sounds perfect to me. It's a generalization of the old functionality to match the new feature. Wonderful!

Great! I tend to be a bit hesitant when modifying function signatures and the likes, but I'm glad it is looking good.

That is certainly weird—I also note that this seems to be failing in the py27 environment but not the pypy one. That certainly shouldn't happen…

I also tried running the tests on your branch locally and couldn't trigger the failure. Seriously weird—any chance this is some crazy, random fluke?

I have just pushed a commit after further investigating the failing test: it seems that I introduced a bug that caused the candidates for a singleton not to be guaranteed to be returned ordered by their distance, and would probably have gone unnoticed if it were not for the weird fail - my apologies. I'm guessing the lucky part was that the candidates dict "order" got reversed on that particular environment, or something similar?

@sampsyo
Copy link
Member

sampsyo commented Jan 20, 2016

Aha! That actually makes sense. I think it's due to Python's newish random hash function, which causes dictionaries to iterate in different orders on every run. http://stackoverflow.com/questions/27522626/hash-function-in-python-3-3-returns-different-results-between-sessions

* Style cleanup and fixes for the "--musicbrainzid" import argument.
* Allow the input of several IDs (separated by spaces) on the "enter Id"
importer prompt.
* Add basic documentation.
@diego-plan9
Copy link
Member Author

I'm glad the mystery has been sorted out, I was rather puzzled myself for a while!

Thanks as well for the review and the notes, I have hopefully addressed the issues on the latest commit and also added some basic documentation. As for the ability of specifying multiple IDs when using the interactive import enter Id prompt discussed on #1808 (comment), I have opted for including it along with a mention on tagger.rst (that was missing a comment on the enter Id choice altogether).

Next step will be an attempt to make mb_ids a parameter of the Task, as mentioned on the initial comments - if it ends up being too messy, we can safely discard it.

A question I came across while revising documentation and other issues: I have personally never used the discogs backend and unfortunately could not locate any discogs-specific unit tests, so could you let me know if you envisage any conflicts when using discogs instead of musicbrainz, or if there are some manual tests that need to be performed? There seems to be some potential interference between these changes and the features provided by the discogs plugin.

* Store the user-supplied MusicBrainz IDs (via the "--musicbrainzid"
importer argument) on ImporTask.task.musicbrainz_ids during the
lookup_candidates() pipeline stage.
* Update test cases to reflect the changes.
@diego-plan9
Copy link
Member Author

4eedd2b contains the aforementioned attempt to make the feature a parameter of the Task. In an effort for keeping the changes to a minimal, I took some shortcuts: in particular, sacrificed a bit of explicitness with the actual Task.musicbrainz_ids attribute (no "setter"-like method, or constructor parameter, etc), and also placed the initialization of the attribute directly on the lookup_candidates pipeline stage.

A more correct way of making these changes useful to the eventual "MusicBrainzID <-> task" mapper feature/mechanism would probably involve creating another pipeline stage, or similar changes to the Factory or Session. However, as it is not really clear to me yet how the feature would work (nor if it would be a really useful addition at this point), I feel these changes would end up being placeholders and candidates for being replaced entirely once the feature is discussed and implemented.

As usual, I'd love to hear your thoughts, and make any further changes as desired!

@sampsyo
Copy link
Member

sampsyo commented Jan 21, 2016

I have personally never used the discogs backend and unfortunately could not locate any discogs-specific unit tests, so could you let me know if you envisage any conflicts when using discogs instead of musicbrainz

This should be OK. Since the ID lookups are hidden behind the *_for_id function in beets.autotag.hooks, the Discogs backend shouldn't be affected. (Further evidence: this change didn't require any updates in beets.autotag.mb.)


# Restrict the initial lookup to IDs specified by the user via the -m
# option. Currently all the IDs are passed onto the tasks directly.
task.musicbrainz_ids = session.config['musicbrainz_ids'].as_str_seq()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great for now. As you note in your other comment, we can move this around later if we want a more sophisticated way of populating this.

* Rename the importer argument and related variables to make it more
generic, as the feature should be independent of the backend used and
not restricted to MusicBrainz.
* Update documentation and docstrings accordingly.
* Add changelog entry.
@diego-plan9
Copy link
Member Author

This is a tiny nitpick, but these technically don't need to be MBIDs. Other backends can also interpret these IDs (for now, just the Discogs plugin).

Alas, I don't have a great idea for another option name since -i and -I are both taken. Perhaps -f for --force-id? Or maybe -d as a second-choice contraction of --id? Or -S for --search-id?

Sounds like a more than fair nitpick, and indeed derived from my initial "MusicBrainz-centric" approach. I went with the -S option in order to try to adhere the "first letter of the long option" policy to the extent possible - -f sounded tempting as well, but I preferred to reserve the letter for eventual arguments that might do other, more intrusive "forcing"s.

I also took the chance to modify the documentation and docstrings, as well as rename the variables used, in order to make it clearer that the feature should be backend-agnostic. For the docs, it basically ended up being a s/MusicBrainz/metadata backend/ modification: perhaps there is a better term?

I have also added an entry to the changelog, in the hopes of having the pull request ready for the final review!

@@ -132,6 +132,11 @@ Optional command flags:
option. If set, beets will just print a list of files that it would
otherwise import.

* If you already have a metadata backend ID that matches the items to be
imported, you can instruct beets to restrict the search to that ID instead of
searching for other candidates by using the ``--search_id SEARCH_ID`` option.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Itsy-bitsy typo: --search-id, not --search_id.

@sampsyo
Copy link
Member

sampsyo commented Jan 22, 2016

Fantastic! This all looks perfect. I found one incredibly tiny typo, but this is otherwise 100% ready. Feel free to hit the big green button. 🎉

This is a surprisingly frequent request, so I'm thrilled to have this implemented!

@diego-plan9
Copy link
Member Author

Thanks for double-checking, as usual, and my apologies for the careless typo! I'm also glad it looks like the feature will be put to good use 👍

I was planning on merging when the travis build was completed, but seems like it failed due some network problems / truncated download of sorts - I'm not sure if there is a way to relaunch the job and get rid of the little red "x" next to the commit. Strange luck with the builds on this pull requests, it seems!

@sampsyo
Copy link
Member

sampsyo commented Jan 22, 2016

Bad luck indeed. I have the power to restart a build -- I'll see if I can extend that to GitHub collaborators too.

@sampsyo
Copy link
Member

sampsyo commented Jan 22, 2016

I didn't figure that out, but rerunning the build did work! ✨ Merging now.

sampsyo added a commit that referenced this pull request Jan 22, 2016
Add musicbrainz id option to importer
@sampsyo sampsyo merged commit cb447f7 into beetbox:master Jan 22, 2016
@diego-plan9 diego-plan9 deleted the mbid branch October 17, 2016 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants