Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra fields from MusicBrainz (missing fields) #1547

Open
bearcatsandor opened this issue Jul 22, 2015 · 33 comments
Open

Extra fields from MusicBrainz (missing fields) #1547

bearcatsandor opened this issue Jul 22, 2015 · 33 comments
Labels
feature features we would like to implement

Comments

@bearcatsandor
Copy link

Yanno, for all that beets prides itself in being the tagger for OCD music lovers, i'm a little surprised at the info it doesn't pull from musicbrainz.

It doesn't pull performer, composer, lyricist, conductor, engineer or producer tags. In short, why not just pull ALL the tags that it can from musicbrainz (see here https://picard.musicbrainz.org/docs/mappings/ ) and apply it to the file and database for searching. Why wouldn't i want to find all the various tracks that Jon Coltrane plays on even when the album title or group doesn't bare his name?

I'm just learning Python now, so should i take a shot at this? Should it be a plugin, or should i not bother because it's on the roadmap?

@sampsyo
Copy link
Member

sampsyo commented Jul 22, 2015

I'm for it! Specifically, this won't be too onerous if we take this tactic:

  • Refactor the AlbumInfo and TrackInfo classes to just be "bags of tags" rather than having specific fields that need to be filled (and explicitly added to the code every time we think of a new field).
  • Refactor the autotag machinery for applying metadata to just copy data from the Info object to the database object blindly (again to avoid needing to add per-field plumbing). This would be agnostic to whether the field is fixed or flexible.
  • Now the mb.py backend is free to iterate as much as possible and add all the metadata it likes without disturbing the rest of the codebase.

See also #506.

@sampsyo sampsyo added the feature features we would like to implement label Jul 22, 2015
@Schweinepriester

This comment has been minimized.

1 similar comment
@ekjaker

This comment has been minimized.

@anshuman73
Copy link
Contributor

@sampsyo,
How exactly would you like me to proceed ?
I believe reading in the task description that we need to change the entire way of storing the metadata received from MusicBrainz.

@sampsyo
Copy link
Member

sampsyo commented Dec 1, 2016

Hey again, @anshuman73! You're right; this one is a bit more involved than last time.

Let's start this way. Try prototyping replacements for the AlbumInfo and TrackInfo classes in beets/autotag/hooks.py. At first, these classes should work the same way as before, but they should get rid of the prescribed list of fields. In fact, it should work to set any field on it, much like a Python dictionary. This should work:

>>> info = TrackInfo(some_field=12)
>>> info.some_other_field = 50

Then, once that's working, we'll want to revisit the apply_metadata function in autotag/__init__.py to take advantage of this new structure. We should be able to simplify it down to just copy over all the fields from AlbumInfo to Album, and the same for TrackInfo to Item, with no complicated logic.

Sound good? Let me know if you have any questions.

@anshuman73
Copy link
Contributor

@sampsyo
So we want the classes to have all the class variables they already have and be able to add more class variable dynamically as and when required ?
Also I should remove the compulsory fields ( for eg, album, album_id, artist, etc) and let the an instance of the Class be declared with any 1 of the fields, right ? (sorry if I'm repeating what you said, just wanted to confirm I've understood it right)
and then move onto autotag/__init__.py and change the structure

@sampsyo
Copy link
Member

sampsyo commented Dec 3, 2016

Yep, you totally got it!

As you said, the first step is to get rid of all the built-in fields and just let any field work on those classes. The easiest way might be to make those classes inherit from dict and then define the magic methods __getattr__ and __setattr__, rather like this SO answer: http://stackoverflow.com/a/32107024

@anshuman73
Copy link
Contributor

Ahh this SO link is truly magic !
you want the classes to be able to add new fields in the typical dictionary way too ? For Eg,
(Trackinfo_instance['new _field'] = 'a')?

@sampsyo
Copy link
Member

sampsyo commented Dec 4, 2016

Yes, exactly. And for backwards compatibility, info.field should work the same as info['field'].

@dosoe
Copy link
Contributor

dosoe commented Aug 2, 2017

Is anyone willing to do this? Else I can give it a try

@dosoe
Copy link
Contributor

dosoe commented Aug 2, 2017

Ok first question: I have a fork of beets but it already has a pending pull request, so if I make a new branch it will have the commits of this pull request (#2563). How can I get a new fork or a clean branch that only contains the 'real' beets?

@dosoe
Copy link
Contributor

dosoe commented Aug 2, 2017

If I just take my fork and make a new branch, the branch will start as a copy of my master branch, which is not exactly beets (20 more commits).

@fortes
Copy link
Member

fortes commented Aug 2, 2017

@dosoe If I'm understanding correctly, you want to update your fork to have the latest from the master repository. This document should help: https://help.github.com/articles/syncing-a-fork/

You should probably create a second branch in order to do this work, so once you have the upstream remote configured, you could probably do something like:

git checkout upstream/master
git checkout -b mb-extra-fields

Then you can push that mb-extra-fields branch up to origin and create a separate pull request w/ that.

@dosoe
Copy link
Contributor

dosoe commented Aug 2, 2017

Ok I finally got it going, thanks.

@dosoe
Copy link
Contributor

dosoe commented Aug 2, 2017

Now how is this supposed to work? From the upper description I get that we want to start it by changing the __init__ from the TrackInfo and AlbumInfo classes of beets/autotag/hooks.py:
Right now it is

def __init__(self, title, (many tags)):
    self.title = title
    self.track_id = track_id
    .
    .
    .

What do we want to do? Do we keep it like this and add a add_tag function on each class?
Do we transform it into something like:

def add_tag(self, tag_name)
    (ads an empty tag)
def __init__(self, st=some_tag):
    self.add_tag(self, st)
    self.st = some_tag
    .
    .
    .

?
I am not a programmer, so expect to get stupid questions...

@dosoe
Copy link
Contributor

dosoe commented Aug 2, 2017

What holds us back from just making a copy-pasting this code: http://stackoverflow.com/a/32107024 to replace TrackInfo and AlbumInfo? There is this decode function (that ensures everything is utf-8) to rewrite but else it does what we want it to do, right? Then we have to play around with apply_item_metadata in beets/autotag/__init__.py to just loop through all the keys of a TrackInfo object and same for apply_metadata with AlbumInfo? And in beet/autotag/mb.py the only thing to change is when we call beets.autotag.hooks.TrackInfo at the start of track_info and the corresponding place for album_info. Did I get it right or am I making it more confusing?

@dosoe
Copy link
Contributor

dosoe commented Aug 2, 2017

and in beet/autotag/__init__.py we have to replace all the if track_info.medium_index is not None: by if track_info.get(medium_index): .
Is there anything else?

@dosoe
Copy link
Contributor

dosoe commented Aug 2, 2017

Here is a first try (that doesn't work) as a work base: #2650

@dosoe
Copy link
Contributor

dosoe commented Aug 2, 2017

I would suggest to continue the discussion there.

@dosoe
Copy link
Contributor

dosoe commented Aug 4, 2017

Hi! On #2650 I started working on it, but there is a recurrent problem: I am trying to transform the Object inherited class TrackInfo in a dict related class. However, the problem there is that dict objects are not hashable, which makes it impossible to make a deepcopy of a list of dict and to make somthing like list(set(tracks) - set(mapping.values())) were tracks is a list of TrackInfo objects. Would anyone have an idea how to get around these problems?

@sampsyo
Copy link
Member

sampsyo commented Aug 4, 2017

Good question. I think the right solution is probably to make the TrackInfo objects hashable to restore the old functionality. The easiest way to do that is probably just to define a __hash__ function that uses the default implementation:

def __hash__(self):
    return object.__hash__(self)

That would preserve identity-based hashing instead of contents-based hashing as described in this relevant-seeming SO answer.

Perhaps the cleaner thing to do, however, would be to avoid making TrackInfo inherit directly from dict. Instead, just keep a dict of values as a field on the TrackInfo object. That way, we'd sidestep the issue and avoid any unpleasantness of sharing the implementation from dict.

@dosoe
Copy link
Contributor

dosoe commented Aug 4, 2017

I did it because you looked for something like https://stackoverflow.com/questions/2352181/how-to-use-a-dot-to-access-members-of-dictionary/32107024#32107024 , so I just copy-pasted it. That's why we end up with a dict inheritance.

@dosoe
Copy link
Contributor

dosoe commented Aug 7, 2017

I tried to make TrackInfo inherit from object and failed miserably. Trying the first solution you propose however leads to problems with deepcopy

@gabrielhdt
Copy link

Hello,
I implemented what you suggested above in #2654, TrackInfo and AlbumInfo inherit from ItemInfo (a new class) which inherits from dict. The hash function uses the hash of the track id from musicbrainz, since it must be unique, it seems to be a good candidate.

I still have to make apply_metadata generic.

gabrielhdt pushed a commit to gabrielhdt/beets that referenced this issue Aug 13, 2017
Updated calls to TrackInfo and AlbmInfo using keywords only arguments,
implemented reduce methods for TrackInfo and AlbumInfo for
copy.deepcopy (use also track_id and album_id) beetbox#1547
@dosoe
Copy link
Contributor

dosoe commented Aug 14, 2017

Is the mb_trackid a good choice? Does every track have one or do you add tracks by hand or take metadata from let's say discogs without putting a mb_trackid on it?

@dosoe
Copy link
Contributor

dosoe commented Aug 14, 2017

I have some tracks I can't identify with MB so I'm thinking about setting the metadata by hand, but these tracks won't have a mb_trackid.

@gabrielhdt
Copy link

Good point. I assumed every track is identified with MB. Shouldn't you add the release on MusicBrainz, and then identify from MB, in that case? I think the software should entrust everything regarding data to MB, and just dump data from it, with as less reprocessing as possible. Else, a fallback id can be used, depending on the source.

@dosoe
Copy link
Contributor

dosoe commented Aug 16, 2017

It's what I'm doing, but for some of my music I can't trace back which release it is, and I don't expect everyone to do it. I think unsing mb_id with a fallback is a good idea.

@dosoe
Copy link
Contributor

dosoe commented May 10, 2020

  • Refactor the AlbumInfo and TrackInfo classes to just be "bags of tags" rather than having specific fields that need to be filled (and explicitly added to the code every time we think of a new field).

  • Refactor the autotag machinery for applying metadata to just copy data from the Info object to the database object blindly (again to avoid needing to add per-field plumbing). This would be agnostic to whether the field is fixed or flexible.

The first part has been dealt with, do you have an idea on how you want the next point to be like? I'm afraid it's not gonna be straightforward, we will need to add plumbing (for example the 'recording engineer' is called 'recording' in the MB API), but at least for instruments and engineers it should be rather straightforward to implement.

@sampsyo
Copy link
Member

sampsyo commented May 11, 2020

That second bullet, fortunately, is just about refactoring the apply_metadata and apply_item_metadata functions, which you have already gotten started in #3587. Thanks!!

@dosoe
Copy link
Contributor

dosoe commented May 12, 2020

I'm trying right now to implement automatic fetching of all artists associated with a recording, but I'm running into a problem. Let's say we retrieve all artists and create appropriate tags (like Soprano vocals) flexibly for each recording. What happens if there is a change? Let's say it turns out the soprano singer was in fact a mezzo-soprano singer and the corresponding change is made on MusicBrainz (like for example this edit: https://musicbrainz.org/edit/69614459 ). We would then end up with two tags, Soprano vocals and Mezzo-soprano vocals with both the same content. How do we remove the Soprano vocals tag now that it's not needed anymore? Else, over time, tracks would become more and more cluttered with obsolete tags. How do we arrange that 'obsolete' tags disappear? Before flexible tags, it was easy, as each tag was modified to None if beets didn't get the corresponding data (even if sometimes there is clutter when album-related tags stay on a track that is reimported as a singleton, but that's more a problem of the file than a problem of the library).
So my question is: how do we clean up obsolete tags, now that they are flexible?
Could there be an option when storing the info in the library to give a list of tags that are up-to-date so that when storing all other tags are removed? I'm afraid that would be complicated to implement, especially as it would need to keep track of the multiple plugins that add additional tags (discogs, keyfinder, parentwork to name just a few). What's your opinion on this?

@sampsyo
Copy link
Member

sampsyo commented May 13, 2020

Yeah, that seems fairly tricky! Fortunately, it is possible to remove flexible attributes—just doing del item['field'] or del item.field should do it.

Of course, figuring out the right policy for when to delete fields is the hard part. In the hypothetical plugin that gets a bunch of extra artists for a track, it could be a policy that, every time the plugin runs, it deletes all of those fields that it can populate and re-fills them.

@dosoe
Copy link
Contributor

dosoe commented May 13, 2020

Well, if you want to do it in a plugin, it's easy (actually, did we need all this work for plugins? Don't flexible attributes already work this way?). If you want to have flexible attributes in beets core, then we should find out how, where and when to clean up tags in the core. One idea could be to impose tags added by plugins to be somehow recognizable, to single them out. That would force us to overhaul all plugins to adapt their tag names, of course, and would mean trouble in the way of backwards-compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature features we would like to implement
Projects
None yet
Development

No branches or pull requests

8 participants