Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch work title from MusicBrainz #2452

Open
dosoe opened this issue Feb 23, 2017 · 34 comments
Open

Fetch work title from MusicBrainz #2452

dosoe opened this issue Feb 23, 2017 · 34 comments
Labels
feature features we would like to implement

Comments

@dosoe
Copy link
Contributor

dosoe commented Feb 23, 2017

Hi!
I'm new to beets. I have a big collection of classical music that is in the MB database.
For classical music a straightforward way of ordering it is to order it first by composer, then by work and then by performer.
MusicBrainz has a relation "recording of" that relates a recording to a work. A work can have a relation "part of" that refers either to another work or to a catalogue. So the works are organised as a tree with the work-work relation "part of" or "part" as a link.
Could it be possible with beets to "climb up" the tree, so that I can create a folder with the name of the "parent work" that contains the recordings of parts of the work?

As an example, if we look at this recording of the Matthäus-Passion of Bach: https://musicbrainz.org/release/51d08afb-0d81-4617-8c33-979806910ddf
there every recording has a tag "recording of" followed by a work. This work is on his side part of the work "Matthäus-Passion, BWV 244: Teil ?" which is part of the work "Matthäus-Passion, BWV 244" which is part of the catalogue "Bach-Werke-Verzeichnis" with the number BWV 244.
Could it be possible to go up the ladder to the uppermost work "Matthäus-Passion, BWV 244" and optionally (not every composer has a complete catalogue, and not every work on MB is linked appropriately to the corresponding catalogue) to the catalog number?
this would give:

Music/Bach, Johann Sebastian/Matthäus-Passion, BWV 244/Performer/St. Matthew Passion

(I choose sort names for artists, else I get stuck with russian, japanese and other names that I can't read for Suzuki, Prokofiev etc.)
and here the next problems appear: who do we choose as the performer? An obvious choice would be the recording artist, but this has 2 problems: in many cases, the recording artist is just the composer because noone updated it. In other cases, as in this release, the recording artist changes in every track, because the choir for example doesn't sing in every track. A solution might be to just list all performers linked to recordings in this album that are linked to the work. This seems to be the best solution to me. However, this doesn't work if we have, for example, a "best of" of the best performances of a given work, but that is pretty rare (even if I have an example in my collection). Maybe there is a better solution.

Is there a way to implement this into beets?

Dorian

@dosoe
Copy link
Contributor Author

dosoe commented Feb 23, 2017

maybe if a given performer has made several recordings of the same work, add the date of the recording.

@dosoe
Copy link
Contributor Author

dosoe commented Feb 23, 2017

and if there is no performer or recording artist, use the release artist

@sampsyo sampsyo added the needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature." label Feb 24, 2017
@sampsyo
Copy link
Member

sampsyo commented Feb 24, 2017

Ok, sounds intriguing! How would you propose to expose the "parent work" information? Would we just record the title, or other stuff too? Do you have ideas for the names of the fields that should hold this information?

@dosoe
Copy link
Contributor Author

dosoe commented Feb 25, 2017

Hi! Thanks for your reply. I would say the title and the disambiguation (for example for arrangements like https://musicbrainz.org/work/51bb8773-8492-3773-ab88-73a89c922c3d ). The composer information would be saved on a different tag.
I don't know how beets is made, but I would imagine adding 2 tags, for "work" and "composer" and then have some routine that can call from the musicbrainz database the "parent works" and the "performers". Or maybe even add an additional tag for "parent work". This kind of data organisation is relevant especially for classical music, for more modern stuff as far as I can tell the music is more rarely organised in several movements of a bigger piece.

@sampsyo
Copy link
Member

sampsyo commented Feb 25, 2017

OK, so just to summarize, you'd like to be able to access these fields in path templates, right?

  • $work: This would be the title of the directly associated work.
  • $composer: I think we already have this one, as of Add Composer, Lyricist and Arranger tags #2333.
  • $parentwork: The title of the parent work?
  • $parentwork_composer: Would this be relevant too?

@dosoe
Copy link
Contributor Author

dosoe commented Feb 25, 2017

Ideally, $parentwork_composer would be relevant. Practically, I expect it to always be the $composer but for the sake of completeness, if it doesn't include a lot of work, I would take it as well.

@dosoe
Copy link
Contributor Author

dosoe commented Feb 25, 2017

Another question: is there a way to use sort names for artists (and therefore composers) with beets?

@sampsyo
Copy link
Member

sampsyo commented Feb 25, 2017

OK, thanks. But just to be clear, $composer already does what you want, right?

If so, I'll make this ticket into a request to get the work title field. Then, as a second stage, we can consider doing the "parent work" thing to get copies of the relevant fields reflecting the parent work.

Yes, we do fetch artist_sort and albumartist_sort from MusicBrainz. If you're ever wondering about this kind of a thing, you can type beet fields to get a complete list.

@dosoe
Copy link
Contributor Author

dosoe commented Feb 25, 2017

I never tried it out, but from what I can read on the conversation, yes. It even uses the "arranger" tag, which can be helpful as well. As I said, I'm new to beets, I'm just importing my collection right now.

@sampsyo sampsyo changed the title Use works for ordering classical music Fetch work title from MusicBrainz Feb 25, 2017
@sampsyo sampsyo added feature features we would like to implement and removed needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature." labels Feb 25, 2017
@sampsyo
Copy link
Member

sampsyo commented Feb 25, 2017

OK, cool. Marked this as a feature request for that first stage.

@dosoe
Copy link
Contributor Author

dosoe commented Feb 25, 2017

Thanks!

@dosoe
Copy link
Contributor Author

dosoe commented Apr 14, 2017

Hi! I have seen that you added "composer" as a field, could it be possible to also make a field "composer_sort" (and "arranger_sort" by the way) or does this field contain it already?

@sampsyo
Copy link
Member

sampsyo commented Apr 15, 2017

Hi, @dosoe—that sounds like a separate feature request. Maybe it deserves a separate GitHub thread?

@dosoe
Copy link
Contributor Author

dosoe commented May 15, 2017

Hi!
After having fun with 'composer_sort', 'arranger_sort' and 'lyricist_sort' (to be submitted) I'm trying to get the 'work' and 'parent_work' settled.
For this I tried out something: I added this in the track_info function of beets.beets.autotag.mb.py

    for work_relation in recording.get('work-relation-list', ()):
        if work_relation['type'] != 'performance':
            continue
        work.append(work_relation['work']['title'])

gives the title of the work

            for parent_work_relation_1 in work_relation['work'].get('work-relation-list',()):
                if parent_work_relation_1['type'] != 'parts':
                    continue
                parent_work_1.append(parent_work_relation_1['work']['title'])

gives the title of the work the initial one is part of

Now however when I try to do the same to get the parent work of parent_work_1, I get nothing:
parent_work_relation_1['work'].get('work-relation-list',())
gives an empty list, even if the parent_work_1 effectively is part of a work (tested with the work being
https://musicbrainz.org/work/2fb76aa1-b37f-3e05-a185-f7e607efaf80 and choosing a recording of the work I have in my collection).

What seems to be happening is that beets is going on the page of the recording and takes out all the information he can out of this. In this case (https://musicbrainz.org/recording/546c4659-96c0-46ee-9b31-cfe3e78a1c48 for testing) it is: the work and the parent_work_1 and other tags such as composer etc.
So would there be a way to not go on the page of the recording but on the page of the work using something similar to the track_url and album_url functions that are defined (but I don't know how and where they are used) since it is easy for a work to fetch its id (work_relation['work']['id'] with everything defined as above) and from there to get to its url:
def work_url(workid):
return urljoin(BASE_URL, 'work/' + workid)

Now I don't know where and how you use the album_url and track_url functions to get to the actual recording infos but you probably do.

Right now I'm going up the ladder of parent works by hand but once I get this working to rather do a 'while' loop to climb up to the top of the ladder, but only once I manage to do it this way.

Other question: How can I submit a merge (for adding 'arranger_sort' and lyricist_sort' tags) while continuing to advance on this 'work' stuff? I also have some issues with the 'arranger' and 'lyricist' tags exposed in #2333

@sampsyo
Copy link
Member

sampsyo commented May 16, 2017

Hmm… to summarize, it sounds like you're interested in how to query the MusicBrainz web service for a specific work ID? For that, you go through the client library and use, for example, get_work_by_id. In general, you might want to read a little bit about the MusicBrainz Web service. For example, here's the URL for the recording you mentioned with its work relations included:
https://musicbrainz.org/ws/2/recording/546c4659-96c0-46ee-9b31-cfe3e78a1c48?inc=work-rels

About the other question: to submit a new PR, the thing to do is to put your work in a branch and push it to your fork. Then you can open several PRs at once; one for each branch.

@dosoe
Copy link
Contributor Author

dosoe commented May 16, 2017

I will read about the MB web service, it sounds like it could answer some of my questions (but not today).
I wonder if it could even be better to make a work_info function like the track_info function in beets.autotag.mb.py. This could also fetch the composer, lyricist and more generally the tags that MB associates to works.

About the other question: So the idea is that I have more than one fork of beets on my repo, right?

@sampsyo
Copy link
Member

sampsyo commented May 16, 2017

Sure, a work_info function would be OK—but it would need to look different from the track_info function. The latter produces a complete TrackInfo object, and there would not be a corresponding WorkInfo object (because there is no such thing as a "work" in the beets database). It would need to build up information to put on the TrackInfo object.

Furthermore, it would be something of a problem if we needed to issue a series of new MusicBrainz API requests to get the work data. Is it possible to pull all the information out of the work-rels included data? If not, we may need to make this data an optional feature to avoid making metadata fetching take much longer than it does currently.

No, there's no need to fork the repo twice—you can just create different branches within one git repository. (The GitHub help pages can be useful for this.)

@dosoe
Copy link
Contributor Author

dosoe commented May 16, 2017

But can't we make a WorkInfo object and just not put it into the library? Just use it as a temporary variable.

@sampsyo
Copy link
Member

sampsyo commented May 16, 2017

Sure! But I'd argue that you'd probably be better of with just a plain dict instead.

@dosoe
Copy link
Contributor Author

dosoe commented May 16, 2017

Yes, I would be very satisfied with that.

@dosoe
Copy link
Contributor Author

dosoe commented May 25, 2017

Ok thanks to get_work_by_id I did the necessary to fetch the work title, the work disambiguation, the parent work title, the parent work disambiguation, the parent work composer name and the parent work composer sort name. However, I already have a pull request (#2563) so if I just push it on my repo it will be added as a commit to this one. Additionally, I only implemented the fetching part (in beet.autotag.mb.py) and not all the stuff around.
However, I can show you what the code looks like:
Just insert it into the track_info function:

lyricist = []
composer = []
composer_sort = []
work = []
work_disambig = []
parent_work = []
parent_work_disambig = []
parent_composer = []
parent_composer_sort = []
for work_relation in recording.get('work-relation-list', ()):
    if work_relation['type'] != 'performance':
        continue
    work_id=work_relation['work']['id']
    work_info=musicbrainzngs.get_work_by_id(work_id, includes=["work-rels","artist-rels"])
    work.append(work_info['work']['title'])
    try:
        work_disambig.append(work_info['work']['disambiguation'])
        parent_disambig_tmp=work_info['work']['disambiguation']
    except KeyError:
        work_disambig.append('')
        parent_disambig_tmp=''
    partof=True
    parent_work_tmp=work_info['work']['title']
    while partof:
        partof=False
        for work_father in work_info['work']['work-relation-list']:
            if work_father['type'] == 'parts': 
                try: 
                    if work_father['direction'] == 'backward':
                        father_id=work_father['work']['id']
                        partof=True
                        work_info=musicbrainzngs.get_work_by_id(father_id, includes=["work-rels","artist-rels"])
                        parent_work_tmp=work_info['work']['title']
                        try:
                            parent_disambig_tmp=work_info['work']['disambiguation']
                        except KeyError:
                            parent_disambig_tmp=''
                except KeyError:
                    pass 
    for artist in work_info['work']['artist-relation-list']:
        if artist['type']=='composer':
            parent_composer.append(artist['artist']['name'])
            parent_composer_sort.append(artist['artist']['sort-name'])
    parent_work.append(parent_work_tmp)
    parent_work_disambig.append(parent_disambig_tmp)

instead of

lyricist = []
composer = []
composer_sort = []
for work_relation in recording.get('work-relation-list', ()):
    if work_relation['type'] != 'performance':
        continue

I guess there also should a 'parent_lyricist' and 'parent_lyricist_sort' tag, but that is easy and quick to do. If the work is not part of a bigger one, the parent_work is the work itself (same for all the 'parent_' tags)
What I assume there is that a work only has one parent, which might not always be the case.
There are probably style errors, but it works for me.

I will need more time to sort out how to choose the performer correctly.

This calls the musicbrainzngs.get_work_by_id function multiple times, so there might be an issue because we can only go on the server once a second.
Additionally, I lately had the problem that a significant proportion of my test runs gave a 503 error (service unavailable).

@sampsyo
Copy link
Member

sampsyo commented May 25, 2017

Hmm; that's interesting! I notice that this seems to have gotten quite a bit more complicated. It seems like it would be a worthy goal to see if this can be done in a more generic way: that is, maybe we can write one function to get all the information for the "parent work," and then a separate function that pulls out all the work-related information from any work? Then, we can just join "parent_" onto the front of all the stuff from the parent work in one fell swoop, rather than needing to duplicate logic for every field.

@dosoe
Copy link
Contributor Author

dosoe commented May 25, 2017 via email

@dosoe
Copy link
Contributor Author

dosoe commented May 25, 2017 via email

@dosoe
Copy link
Contributor Author

dosoe commented May 25, 2017

So basically what is happening is the following:
I get the work id with the recording relationships.
So I have a work id
Then I get the work relationships by using ger_work_by_id
in the work relationships I look for a work that is of 'type': 'part' and of 'direction': 'backward' .
If there is none, then this is the parent work, if there is one I take its id.
Then I repeat with the id I just got.

Once we have the parent work, we take its name, composer, composer_sort, lyricist, etc. The disambiguation needs a try/except syntax because some works don't have a disambiguation. The same way, only works with a parent have a work with a 'direction' tag in their work-relationships.
We should also watch out for dupes, so maybe append the tag to the list only if it's not already in the list, since a recording can very well contain more than many works but all are part of the same parent work.

It may be coded clumsily, but it works.

Now there are two issues:
-first, there might be several parents of one work. I believe that would be an error in the MB database but maybe there are good reasons for this to happen. I don't know so far how to deal with this.
-second, each call of get_work_by_id is a call of MB and I can only do one a second. This would therefore substantially slow down the autotagger (I guess) so maybe it would be a good idea to make it optional or in a plugin (as far as I can tell, this is useful only for classical music and if there were many classical lovers here this would already have been implemented). I have no idea how to do this.

@sampsyo
Copy link
Member

sampsyo commented May 26, 2017

Yeah, making a plugin would be a great way to make the extra queries optional and encapsulate the new code! It's actually fairly straightforward: the beets plugin system has an "import stage" API, where you can add arbitrary code to run on music that's been imported. So an easy way to get started would be to make a plugin that just runs this same code in an import stage, making calls into the beets.autotag.mb module.

Let me know if I can help more with pointing the way!

@dosoe
Copy link
Contributor Author

dosoe commented May 26, 2017

Yes, I would appreciate that if you could help me to set it up.

@sampsyo
Copy link
Member

sampsyo commented May 26, 2017

Sure! Here's the place to start: http://docs.beets.io/en/v1.4.3/dev/plugins.html

Feel free to post questions along the way if anything comes up.

@dosoe
Copy link
Contributor Author

dosoe commented May 31, 2017

Ok, now I have a start file using https://beets.readthedocs.io/en/v1.4.3/dev/plugins.html#add-path-format-functions-and-fields and the keyfinder plugin as a template. However, I don't know how to add a new tag: on the keyfinder plugin they just have a tag mapping and write it directly to the file, but that doesn't work for me. Additionally, I don't know how to tell him to do it also when importing and updating.
I'm attaching the code how he is so far. The part about fetching the data from MB works afaict, even if it is a little ugly (I'm not a programmer).

parentwork.txt

@dosoe
Copy link
Contributor Author

dosoe commented May 31, 2017

here an updated version. It works (when I ask him to print the data, it is correct) , I just don't get how to write the tags into the library. I made a branch for it, but I would like to commit this stuff without modifying my other pull request #2563 .
parentwork.txt

@sampsyo
Copy link
Member

sampsyo commented Jun 1, 2017

Cool! It looks like you're already adding the relevant information to the Item objects and calling store(), so I think that should be enough?

@JDLH
Copy link

JDLH commented Jun 3, 2017

I've only read this thread, not examined the code. I'm encourage to see effort to handle well tagging by composer and work title.

From my knowledge of MusicBrainz, three things to be careful of:

  1. Many releases will not have relationships linking their recordings to works.
  2. A single recording may link to multiple works, say if a single track of an opera recording contains what the score desribes as two scenes, which are represented in MusicBrainz as two Work entities.
  3. There are sometimes 1, or 2, or 3 levels in a Work "is a part of" Work tree. A Work linked to a Recording may be the top-level work all by itself. Or it may be a part of the top-level work (e.g. a movement in a symphony). Or it may be a grandchild of a top-level work (e.g. a scene which is part of an Act which is part of an opera).

It might be helpful to find examples of each of these in MusicBrainz, and include them in your unit test cases.

Good luck with this plug-in!

@dosoe
Copy link
Contributor Author

dosoe commented Jun 3, 2017

Hi! Thanks for your input. First, keep in mind that I am writing this script to handle classical music, because that's what my music is mainly composed of and that's where the composer and the work are important entities.
Concerning point 1: I can't do anything about it, except add the works myself on MB (60 000 edits so far). With the corresponding scripts it is pretty quick to do.
Concerning point 2: This plugin doesn't put out one work title but a list of all the works that are related to the recording. Then for each of these works it goes and fetches the parent work. One thing could be an issue: I consider that a each work has only one parent work, which I would expect to be true but there is strictly speaking no reason for it.
Concerning point 3: This is the reason why I'm doing a while loop. I tested it with different works, some CPE Bach sonatas that have 1 or 2 levels and the Matthew Passion of JS Bach that has up to 4 levels of parent works.
This script aims for classical music, because there the work information is valuable, as said above my goal is to be able to have a file tree such as: parentcomposer/parentwork/performer/recordings
A problem there is to find the performer. The MB tag on the individual recordings can vary on the different parts of a work (for an opera for example), so maybe I would like to make a list of all performers of all the tracks containing parts of the parentwork but then I will have problem to find them precisely, as I know examples of releases where the same work is played twice with different performers.

@jacksondm33
Copy link

Is there a reason TRACK_INCLUDES does not include 'work-rels'? This causes work relations to not be fetched by track_for_id. By adding it, importing songs (singletons) include 'mb_workid', 'composer', 'lyricist', etc. tags, while they did not before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature features we would like to implement
Projects
None yet
Development

No branches or pull requests

4 participants