Skip to content

Parse Copyright Notices

David Kellner edited this page Jul 1, 2024 · 58 revisions

Description

The userscript provides a text parser in the release relationship editor where you can paste credits or load the contents of the release annotation. It tries to extract copyright and legal information from each line of the text input and assists the user to create relationships for these.

Relationships will be added at release level by default. Additionally you can create phonographic copyright relationships at recording level by ticking the checkboxes of the desired recordings.

The parser generally assumes that a copyright holder is a label entity, unless the name of the copyright holder matches one of the release artists. In case the detection of an artist does not work automatically, you can hold down the SHIFT key while clicking to force names to be treated as artist names.

Then it opens the appropriate auto-complete dialog for all unknown names and asks the user to select or create the correct entity. Confirming the dialog creates the relationship and lets the parser continue with the next credit (cancelling the dialog skips the creation of a relationship for the current credit).

Once the user has selected a match for a given name, the userscript caches the MBID of this match and will not ask the user to match the same name again. In case that an incorrect entity has been selected at some point, the CTRL key (or ⌘ on a Mac) allows the user to bypass the cache and force a new search in order to overwrite the old cache entry.

Successfully parsed credit lines will be appended to the edit note, optionally they can also be removed from the input so that only skipped (and partially parsed) lines remain.

Supported notice formats and relationship types

The userscript performs the following steps for all copyright notices:

  1. Parser: Recognizes specific patterns in copyright notices and tries to extract the type, the name of the copyright holder and the optional year (or multiple years) from the given input (as text). This step does neither differentiate between release and recording level relationships nor does it care whether the name belongs to a label or an artist.

  2. Mapper: Decides whether the copyright holder is an artist or a label and maps the type to an internal relationship type ID. Automatically fills the relationship dialogs with this ID, the credited name and the year (skipped if there are multiple unspecific years) before it waits for the user to select the correct target entity. If the cache already contains an entry for the correct entity type and the given name, the dialog can be confirmed automatically.

The following formats and relationship types are supported:

  • copyright symbol ©, alternatively as (C), optionally followed by the year(s)
  • phonographic copyright symbol , alternatively as (P), optionally followed by the year(s)
  • combination of both copyright symbols © & ℗, in any order, separator is optional
  • legal notices licensed to / licensed from / under exclusive license from, alternatively spelt as licence
  • distributed by, marketed by, marketed and distributed by, in any order

After these types, the parser expects the name of the copyright holder(s), which can be either labels or artists. Multiple names have to be separated by slashes, dashes or vertical bars (by default). The parser extracts the entire text until it reaches a terminator, which can be the end of the line, a comma or a full stop.

Both the credit terminator and the name separator patterns are customizable, you can either specify a regular expression or a string for them (a blank pattern input will disable the respective feature). In order to return to the current default values, each pattern input has a reset button.

In case you are using the prepopulated default option for the credit terminator, the following special cases are taken into account, too:

  • under is treated as terminator to handle "X under (non-)exclusive license by Y"
  • company suffixes which are following a comma or contain optional dots (please report unsupported suffixes when you encounter them):
    • LLC / LLP
    • Co. / Co. KG / Corp. / Inc. / Ltd.
    • abbreviations with alternating letters and dots, e.g. B.V. / S.A. / Α.Ε.
    • sequences with multiple of these suffixes, e.g. Co. Ltd.
  • other abbreviations which are not treated as terminator or company suffix:
    • Bros.

If you want to know the exact details about the parser, have a look at the underlying regular expressions. You can find lots of tools which can explain them to you (e.g. https://regex101.com) or you can study this beautiful railroad diagram representation (where I have combined and annotated the expressions).

Collection of unsupported copyright notice formats

Entries for formats, which had caused issues previously, but are supported by now, have been ticked off in this list and added to the test cases.

The major problem is that the userscript has to reliably detect the end of the copyright holder's name. For the easy cases that was just a comma or a full stop, but we also need a special handling for company suffixes after a comma and/or dots which are part of the company suffix.

Version 2022.1.11 now detects "Inc." and "Ltd." (also without trailing dot), "LLC", "LLP", and " under " (for "X under exclusive license to Y") in addition to comma and full stop. Please let me know if you find more patterns which end the name of a copyright holder.

Version 2022.1.21 adds some customization options to be able to ignore certain special characters (terminators and split symbols) in artist and label names.

Types

  • licensed to / licenced to / under exclusive license to / under exclusive licence to

    • This one didn't work as far as adding the licensed to, it only worked for (P) & (C):

      ℗ & © «2016 Maspeth Music BV, under exclusive license to Republic Records, a division of UMG Recordings, Inc. (Eddie O Ent.)»

      • only matched "licensed to"
    • Doesn't add licensed to, but did add (P) & (C):

      © 2021 SSA Recording, LLP, under exclusive license to Republic Records, a division of UMG Recordings, Inc. ℗ 2021 SSA Recording, LLP, under exclusive license to Republic Records, a division of UMG Recordings, Inc.

    • Same:

      ℗ «2021 SSA Recording, LLP, under exclusive license to Republic Records, a division of UMG Recordings, Inc.»

  • distributed by / distributor (unsupported)

    • HD Tracks API: Worked for (P) & (C), but cut off "Inc. and only searched for "The Weeknd XO". Distributed by didn't search at all.

      "pLine": "Distributed By Republic Records.; ℗ 2011 The Weeknd XO, Inc.", "cLine": "© 2011 The Weeknd XO, Inc.",

    • HDTracks API distributor line doesn't work at all (not that I expected it too).

      "distributor": "Universal Music Group

      • currently only expected to work for copyright notices, but I think it should be doable to support this now that "distributed by" is actually supported
  • marketed and distributed by (multiple types)

    • Skips marketed by, but it did add distributed by credit with no problem:

      marketed and distributed by Sony Music Entertainment

Company suffixes

  • LLP

    • I just noticed that on SSA Recording, LLP, that only SSA Recording is going into the search field. The LLP is being chopped off.
      • a comma was interpreted as end of the name
  • Inc / Inc.

    • HD Tracks API: Worked for (P) & (C), but cut off "Inc. and only searched for "The Weeknd XO". Distributed by didn't search at all.

      "pLine": "Distributed By Republic Records.; ℗ 2011 The Weeknd XO, Inc.", "cLine": "© 2011 The Weeknd XO, Inc.",

  • A.E.

    • © Sony Music Entertainment (Greece) Α.Ε.
  • SA/NV (working if the minimal name length is increased to 3, i.e. by replacing \w{2} with \w{3} in the name separator pattern)

    • © EMI Belgium SA/NV

Name terminators

  • All of "Magic Quid Limited under exclusive licence to BMG Rights Management (UK) Limited" went into search and credited as.

    ℗ & © «2019 Magic Quid Limited under exclusive licence to BMG Rights Management (UK) Limited»

    • only comma and full stop were interpreted as end of the name
  • Dashes instead of commas as terminators

    © «2020 Dollar Menu/Cosmos – a division of Cosmos Music, with the exception of track 6 & 9 2020 B3SCI International inc. under the non-exclusive license to Cosmos - a division of Cosmos Music»

  • Also problematic for artist names which contain dots:

    ℗ «2021 MR.BLACK & GProject» (works with "&" as separator pattern)

Other formats

  • Didn't work without any dates:

    ℗ & © «Rare»

  • Phonographic copyright doesn't work if preceded by release label. On UMG releases this will be most of them. Did work on the copyright.

    ℗Motown Records; 2021 UMG Recordings, Inc. © 2021 UMG Recordings, Inc.

    • Is there always a semicolon? It would be easy to skip the release label part in that case. Done: Skip based on the presence of the semicolon.
    • Answer: Yeah. Most of the time I think there is a semicolon. I saw some of the working now, so looks like you fixed it.
  • Doesn't recognize multi label splits as separate releases. Shows as "Shady Records/Aftermath Records/Interscope Records" on search and credited as. Maybe "/" should be treated as a stop, but then there are a few labels where that's part of the name. In this case, it's 3 labels.

    ℗ «2012 Shady Records/Aftermath Records/Interscope Records»

    • I could split names at slashes if that case is more common than label names which actually contain slashes. Do you have any other examples or perceived statistics?
      • Universal Music A/S
    • Decided to split only if the resulting parts have at least two word characters to avoid splitting company suffixes like A/S, other cases should be rare and it's easier to cancel unwanted additional rel dialogs than to add those that were missed.
  • Found new common combination that doesn't work well right now for obvious reasons. It actually only searches "Warner Music Nashville LLC for the U" because of the period and then never even looks for WEA International Inc.

    ℗ & © «2020 Warner Music Nashville LLC for the U.S. and WEA International Inc. for the world outside the U.S.»

  • Adds "The copyright in this sound recording is owned by Pink Floyd Music Ltd." to search and skips marketed by, but it did add distributed by credit with no problem:

    ℗ «2016 The copyright in this sound recording is owned by Pink Floyd Music Ltd., marketed and distributed by Sony Music Entertainment»

  • Copyright holder is prefixed by "The copyright in this compilation is owned by"

    ℗ «2016 The copyright in this compilation is owned by Pink Floyd Music Ltd., marketed and distributed by Sony Music Entertainment»

    • parsing supported, but the "compilation" bit is currently ignored, maybe this should be used to skip adding recording relationships?
  • Misses "Pink Floyd (1987) Ltd." on both. Maybe the space before & after "/" messes it up.

    © 2016 Pink Floyd Music Ltd. / Pink Floyd (1987) Ltd. ℗ 2016 Pink Floyd Music Ltd. / Pink Floyd (1987) Ltd., marketed and distributed by Parlophone Records Ltd., a Warner Music Group Company

    • The dot of Ltd. is interpreted as name terminator so it does not even look for the following slash.
  • Another odd variation that's not currently searching correctly:

    ℗ Digital Remaster 2011 The copyright in this sound recording is owned by Pink Floyd Music Ltd/Pink Floyd (1987) Ltd under exclusive licence to EMI Records Ltd

  • Copyright is spelled out on this variation. I'm assuming that it's © 2016 Pink Floyd Music Ltd. BECAUSE of audio-visual & artwork.

    ℗ «2016 Pink Floyd Music Ltd. The copyright in this sound and audio-visual recording and artwork is owned by Pink Floyd Music Ltd.»

    • I would say this is a won't fix, it would add complexity and is not 100% clear (e.g. is it indeed 2016). You can always force/help the parser to interpret it by adding a (C) symbol.
  • Use of a "|" instead of a "/":

    ℗ «2006 Data Records|Ministry of Sound Recordings Ltd»

  • Region specific. Usually add both to the release since they are usually worldwide releases, i.e.:

    ℗ & © «2008 Atlantic Recording Corporation for the United States and WEA International Inc. for the world outside of the United States»