Extra fields from MusicBrainz (missing fields) #1547

bearcatsandor · 2015-07-22T03:13:51Z

Yanno, for all that beets prides itself in being the tagger for OCD music lovers, i'm a little surprised at the info it doesn't pull from musicbrainz.

It doesn't pull performer, composer, lyricist, conductor, engineer or producer tags. In short, why not just pull ALL the tags that it can from musicbrainz (see here https://picard.musicbrainz.org/docs/mappings/ ) and apply it to the file and database for searching. Why wouldn't i want to find all the various tracks that Jon Coltrane plays on even when the album title or group doesn't bare his name?

I'm just learning Python now, so should i take a shot at this? Should it be a plugin, or should i not bother because it's on the roadmap?

sampsyo · 2015-07-22T15:41:10Z

I'm for it! Specifically, this won't be too onerous if we take this tactic:

Refactor the AlbumInfo and TrackInfo classes to just be "bags of tags" rather than having specific fields that need to be filled (and explicitly added to the code every time we think of a new field).
Refactor the autotag machinery for applying metadata to just copy data from the Info object to the database object blindly (again to avoid needing to add per-field plumbing). This would be agnostic to whether the field is fixed or flexible.
Now the mb.py backend is free to iterate as much as possible and add all the metadata it likes without disturbing the rest of the codebase.

See also #506.

anshuman73 · 2016-12-01T07:20:01Z

@sampsyo,
How exactly would you like me to proceed ?
I believe reading in the task description that we need to change the entire way of storing the metadata received from MusicBrainz.

sampsyo · 2016-12-01T19:55:14Z

Hey again, @anshuman73! You're right; this one is a bit more involved than last time.

Let's start this way. Try prototyping replacements for the AlbumInfo and TrackInfo classes in beets/autotag/hooks.py. At first, these classes should work the same way as before, but they should get rid of the prescribed list of fields. In fact, it should work to set any field on it, much like a Python dictionary. This should work:

>>> info = TrackInfo(some_field=12)
>>> info.some_other_field = 50

Then, once that's working, we'll want to revisit the apply_metadata function in autotag/__init__.py to take advantage of this new structure. We should be able to simplify it down to just copy over all the fields from AlbumInfo to Album, and the same for TrackInfo to Item, with no complicated logic.

Sound good? Let me know if you have any questions.

anshuman73 · 2016-12-02T18:57:27Z

@sampsyo
So we want the classes to have all the class variables they already have and be able to add more class variable dynamically as and when required ?
Also I should remove the compulsory fields ( for eg, album, album_id, artist, etc) and let the an instance of the Class be declared with any 1 of the fields, right ? (sorry if I'm repeating what you said, just wanted to confirm I've understood it right)
and then move onto autotag/__init__.py and change the structure

sampsyo · 2016-12-03T16:09:03Z

Yep, you totally got it!

As you said, the first step is to get rid of all the built-in fields and just let any field work on those classes. The easiest way might be to make those classes inherit from dict and then define the magic methods __getattr__ and __setattr__, rather like this SO answer: http://stackoverflow.com/a/32107024

anshuman73 · 2016-12-04T02:42:16Z

Ahh this SO link is truly magic !
you want the classes to be able to add new fields in the typical dictionary way too ? For Eg,
(Trackinfo_instance['new _field'] = 'a')?

sampsyo · 2016-12-04T04:45:45Z

Yes, exactly. And for backwards compatibility, info.field should work the same as info['field'].

dosoe · 2017-08-02T14:57:37Z

Is anyone willing to do this? Else I can give it a try

dosoe · 2017-08-02T15:02:54Z

Ok first question: I have a fork of beets but it already has a pending pull request, so if I make a new branch it will have the commits of this pull request (#2563). How can I get a new fork or a clean branch that only contains the 'real' beets?

dosoe · 2017-08-02T15:10:33Z

If I just take my fork and make a new branch, the branch will start as a copy of my master branch, which is not exactly beets (20 more commits).

fortes · 2017-08-02T15:12:23Z

@dosoe If I'm understanding correctly, you want to update your fork to have the latest from the master repository. This document should help: https://help.github.com/articles/syncing-a-fork/

You should probably create a second branch in order to do this work, so once you have the upstream remote configured, you could probably do something like:

git checkout upstream/master
git checkout -b mb-extra-fields

Then you can push that mb-extra-fields branch up to origin and create a separate pull request w/ that.

dosoe · 2017-08-02T15:19:55Z

Ok I finally got it going, thanks.

dosoe · 2017-08-02T15:33:57Z

Now how is this supposed to work? From the upper description I get that we want to start it by changing the __init__ from the TrackInfo and AlbumInfo classes of beets/autotag/hooks.py:
Right now it is

def __init__(self, title, (many tags)):
    self.title = title
    self.track_id = track_id
    .
    .
    .

What do we want to do? Do we keep it like this and add a add_tag function on each class?
Do we transform it into something like:

def add_tag(self, tag_name)
    (ads an empty tag)
def __init__(self, st=some_tag):
    self.add_tag(self, st)
    self.st = some_tag
    .
    .
    .

?
I am not a programmer, so expect to get stupid questions...

dosoe · 2017-08-02T16:59:29Z

What holds us back from just making a copy-pasting this code: http://stackoverflow.com/a/32107024 to replace TrackInfo and AlbumInfo? There is this decode function (that ensures everything is utf-8) to rewrite but else it does what we want it to do, right? Then we have to play around with apply_item_metadata in beets/autotag/__init__.py to just loop through all the keys of a TrackInfo object and same for apply_metadata with AlbumInfo? And in beet/autotag/mb.py the only thing to change is when we call beets.autotag.hooks.TrackInfo at the start of track_info and the corresponding place for album_info. Did I get it right or am I making it more confusing?

dosoe · 2017-08-02T17:05:59Z

and in beet/autotag/__init__.py we have to replace all the if track_info.medium_index is not None: by if track_info.get(medium_index): .
Is there anything else?

dosoe · 2017-08-02T17:24:48Z

Here is a first try (that doesn't work) as a work base: #2650

dosoe · 2017-08-02T17:25:09Z

I would suggest to continue the discussion there.

dosoe · 2017-08-04T09:44:25Z

Hi! On #2650 I started working on it, but there is a recurrent problem: I am trying to transform the Object inherited class TrackInfo in a dict related class. However, the problem there is that dict objects are not hashable, which makes it impossible to make a deepcopy of a list of dict and to make somthing like list(set(tracks) - set(mapping.values())) were tracks is a list of TrackInfo objects. Would anyone have an idea how to get around these problems?

sampsyo · 2017-08-04T13:21:12Z

Good question. I think the right solution is probably to make the TrackInfo objects hashable to restore the old functionality. The easiest way to do that is probably just to define a __hash__ function that uses the default implementation:

def __hash__(self):
    return object.__hash__(self)

That would preserve identity-based hashing instead of contents-based hashing as described in this relevant-seeming SO answer.

Perhaps the cleaner thing to do, however, would be to avoid making TrackInfo inherit directly from dict. Instead, just keep a dict of values as a field on the TrackInfo object. That way, we'd sidestep the issue and avoid any unpleasantness of sharing the implementation from dict.

dosoe · 2017-08-04T13:38:23Z

I did it because you looked for something like https://stackoverflow.com/questions/2352181/how-to-use-a-dot-to-access-members-of-dictionary/32107024#32107024 , so I just copy-pasted it. That's why we end up with a dict inheritance.

dosoe · 2017-08-07T14:15:10Z

I tried to make TrackInfo inherit from object and failed miserably. Trying the first solution you propose however leads to problems with deepcopy

gabrielhdt · 2017-08-13T11:06:30Z

Hello,
I implemented what you suggested above in #2654, TrackInfo and AlbumInfo inherit from ItemInfo (a new class) which inherits from dict. The hash function uses the hash of the track id from musicbrainz, since it must be unique, it seems to be a good candidate.

I still have to make apply_metadata generic.

Updated calls to TrackInfo and AlbmInfo using keywords only arguments, implemented reduce methods for TrackInfo and AlbumInfo for copy.deepcopy (use also track_id and album_id) beetbox#1547

dosoe · 2017-08-14T10:18:02Z

Is the mb_trackid a good choice? Does every track have one or do you add tracks by hand or take metadata from let's say discogs without putting a mb_trackid on it?

dosoe · 2017-08-14T10:19:52Z

I have some tracks I can't identify with MB so I'm thinking about setting the metadata by hand, but these tracks won't have a mb_trackid.

gabrielhdt · 2017-08-14T10:58:17Z

Good point. I assumed every track is identified with MB. Shouldn't you add the release on MusicBrainz, and then identify from MB, in that case? I think the software should entrust everything regarding data to MB, and just dump data from it, with as less reprocessing as possible. Else, a fallback id can be used, depending on the source.

dosoe · 2017-08-16T22:24:08Z

It's what I'm doing, but for some of my music I can't trace back which release it is, and I don't expect everyone to do it. I think unsing mb_id with a fallback is a good idea.

dosoe · 2020-05-10T12:13:55Z

Refactor the AlbumInfo and TrackInfo classes to just be "bags of tags" rather than having specific fields that need to be filled (and explicitly added to the code every time we think of a new field).

Refactor the autotag machinery for applying metadata to just copy data from the Info object to the database object blindly (again to avoid needing to add per-field plumbing). This would be agnostic to whether the field is fixed or flexible.

The first part has been dealt with, do you have an idea on how you want the next point to be like? I'm afraid it's not gonna be straightforward, we will need to add plumbing (for example the 'recording engineer' is called 'recording' in the MB API), but at least for instruments and engineers it should be rather straightforward to implement.

sampsyo · 2020-05-11T11:30:32Z

That second bullet, fortunately, is just about refactoring the apply_metadata and apply_item_metadata functions, which you have already gotten started in #3587. Thanks!!

dosoe · 2020-05-12T11:39:56Z

I'm trying right now to implement automatic fetching of all artists associated with a recording, but I'm running into a problem. Let's say we retrieve all artists and create appropriate tags (like Soprano vocals) flexibly for each recording. What happens if there is a change? Let's say it turns out the soprano singer was in fact a mezzo-soprano singer and the corresponding change is made on MusicBrainz (like for example this edit: https://musicbrainz.org/edit/69614459 ). We would then end up with two tags, Soprano vocals and Mezzo-soprano vocals with both the same content. How do we remove the Soprano vocals tag now that it's not needed anymore? Else, over time, tracks would become more and more cluttered with obsolete tags. How do we arrange that 'obsolete' tags disappear? Before flexible tags, it was easy, as each tag was modified to None if beets didn't get the corresponding data (even if sometimes there is clutter when album-related tags stay on a track that is reimported as a singleton, but that's more a problem of the file than a problem of the library).
So my question is: how do we clean up obsolete tags, now that they are flexible?
Could there be an option when storing the info in the library to give a list of tags that are up-to-date so that when storing all other tags are removed? I'm afraid that would be complicated to implement, especially as it would need to keep track of the multiple plugins that add additional tags (discogs, keyfinder, parentwork to name just a few). What's your opinion on this?

sampsyo · 2020-05-13T12:31:39Z

Yeah, that seems fairly tricky! Fortunately, it is possible to remove flexible attributes—just doing del item['field'] or del item.field should do it.

Of course, figuring out the right policy for when to delete fields is the hard part. In the hypothetical plugin that gets a bunch of extra artists for a track, it could be a policy that, every time the plugin runs, it deletes all of those fields that it can populate and re-fills them.

dosoe · 2020-05-13T14:57:05Z

Well, if you want to do it in a plugin, it's easy (actually, did we need all this work for plugins? Don't flexible attributes already work this way?). If you want to have flexible attributes in beets core, then we should find out how, where and when to clean up tags in the core. One idea could be to impose tags added by plugins to be somehow recognizable, to single them out. That would force us to overhaul all plugins to adapt their tag names, of course, and would mean trouble in the way of backwards-compatibility.

sampsyo added the feature features we would like to implement label Jul 22, 2015

sampsyo mentioned this issue Sep 24, 2015

Song Credits Information MusicBrainz #1622

Closed

This comment has been minimized.

Sign in to view

sampsyo mentioned this issue Oct 30, 2016

Beets not writing vorbis comments to FLAC files when modifying #2250

Closed

sampsyo mentioned this issue Nov 29, 2016

Fixed #2200 and a small typo #2294

Merged

anshuman73 mentioned this issue Dec 1, 2016

Parse secondary album types from MusicBrainz #2200

Closed

ibmibmibm mentioned this issue Dec 20, 2016

Add Composer, Lyricist and Arranger tags #2333

Merged

sampsyo mentioned this issue Aug 2, 2017

Create arranger_sort, lyricist_sort, performer and performer_sort tags #2563

Closed

dosoe mentioned this issue Aug 2, 2017

Make Tags more flexible to implement #2650

Closed

gabrielhdt mentioned this issue Aug 13, 2017

new generic TrackInfo and AlbumInfo #2654

Closed

dosoe mentioned this issue Mar 8, 2018

Add parentwork plugin #2580

Closed

dosoe mentioned this issue Apr 18, 2020

Get performers and instruments #3556

Closed

dosoe mentioned this issue Apr 25, 2020

First try adding new albuminfo and trackinfo class #3568

Merged

dosoe mentioned this issue May 14, 2020

Fetch artists, performers, engineers etc flexibly #3589

Closed

snejus mentioned this issue Dec 5, 2024

Release: Fix changelog formatting #5529

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extra fields from MusicBrainz (missing fields) #1547

Extra fields from MusicBrainz (missing fields) #1547

bearcatsandor commented Jul 22, 2015

sampsyo commented Jul 22, 2015

This comment has been minimized.

This comment has been minimized.

anshuman73 commented Dec 1, 2016

sampsyo commented Dec 1, 2016

anshuman73 commented Dec 2, 2016

sampsyo commented Dec 3, 2016

anshuman73 commented Dec 4, 2016

sampsyo commented Dec 4, 2016

dosoe commented Aug 2, 2017

dosoe commented Aug 2, 2017

dosoe commented Aug 2, 2017

fortes commented Aug 2, 2017

dosoe commented Aug 2, 2017 •

edited

Loading

dosoe commented Aug 2, 2017

dosoe commented Aug 2, 2017

dosoe commented Aug 2, 2017

dosoe commented Aug 2, 2017

dosoe commented Aug 2, 2017

dosoe commented Aug 4, 2017

sampsyo commented Aug 4, 2017

dosoe commented Aug 4, 2017

dosoe commented Aug 7, 2017

gabrielhdt commented Aug 13, 2017

dosoe commented Aug 14, 2017

dosoe commented Aug 14, 2017

gabrielhdt commented Aug 14, 2017

dosoe commented Aug 16, 2017

dosoe commented May 10, 2020

sampsyo commented May 11, 2020

dosoe commented May 12, 2020 •

edited

Loading

sampsyo commented May 13, 2020

dosoe commented May 13, 2020 •

edited

Loading

Extra fields from MusicBrainz (missing fields) #1547

Extra fields from MusicBrainz (missing fields) #1547

Comments

bearcatsandor commented Jul 22, 2015

sampsyo commented Jul 22, 2015

This comment has been minimized.

This comment has been minimized.

anshuman73 commented Dec 1, 2016

sampsyo commented Dec 1, 2016

anshuman73 commented Dec 2, 2016

sampsyo commented Dec 3, 2016

anshuman73 commented Dec 4, 2016

sampsyo commented Dec 4, 2016

dosoe commented Aug 2, 2017

dosoe commented Aug 2, 2017

dosoe commented Aug 2, 2017

fortes commented Aug 2, 2017

dosoe commented Aug 2, 2017 • edited Loading

dosoe commented Aug 2, 2017

dosoe commented Aug 2, 2017

dosoe commented Aug 2, 2017

dosoe commented Aug 2, 2017

dosoe commented Aug 2, 2017

dosoe commented Aug 4, 2017

sampsyo commented Aug 4, 2017

dosoe commented Aug 4, 2017

dosoe commented Aug 7, 2017

gabrielhdt commented Aug 13, 2017

dosoe commented Aug 14, 2017

dosoe commented Aug 14, 2017

gabrielhdt commented Aug 14, 2017

dosoe commented Aug 16, 2017

dosoe commented May 10, 2020

sampsyo commented May 11, 2020

dosoe commented May 12, 2020 • edited Loading

sampsyo commented May 13, 2020

dosoe commented May 13, 2020 • edited Loading

dosoe commented Aug 2, 2017 •

edited

Loading

dosoe commented May 12, 2020 •

edited

Loading

dosoe commented May 13, 2020 •

edited

Loading