Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First try adding new albuminfo and trackinfo class #3568

Merged
merged 30 commits into from
May 9, 2020

Conversation

dosoe
Copy link
Contributor

@dosoe dosoe commented Apr 25, 2020

New attempt at dealing with #1547 learning from errors in #2650

@dosoe
Copy link
Contributor Author

dosoe commented Apr 25, 2020

This new implementation of AlbumInfo and TrackInfo allows flexible tags. A quick test with some of my data seems to work. It has one significant difference: because I overcharged __getattr__ the function hasattr(track, tag) will always return True. To check if a tag is set, one must check if track.tag==None. This concerns especially plugins, as they sometimes add new tags, but it might be important elsewhere.

@dosoe
Copy link
Contributor Author

dosoe commented Apr 25, 2020

This behaviour of __getattr__ is what makes the tests fail, as it is needed in python2 for copy.deepcopy(x) . One solution would be to adapt __getattr__ by telling it to raise an AttributeError each time we want to get a tag that either does not exist or whose value is None. @sampsyo , could that be a concern or can I implement it? Are there tags that should exist with the value None on an item that should be kept as such, or can these tags be removed from the item?

@dosoe
Copy link
Contributor Author

dosoe commented Apr 25, 2020

It raises the question: The goal of this is to have some core attributes that are always populated and then some that are flexibly attributed. Which attributes are core attributes? From the definition of AlbumInfo, it seems that only album, album_id, artist, artist_id, tracks are core (as the others are optional) but in the tests it calls for the other values.
Do we want all tags to return None when called (and not populated) or do we want to raise an AttributeError? The latter is compatible with copy.deepcopy in python2, the former isn't.

@sampsyo
Copy link
Member

sampsyo commented Apr 25, 2020

Great! Thanks for getting this going!

Here's what I think we should do:

  • Everything should be flexible, i.e., no built-in attributes.
  • As a purely transitional matter, maybe we should provide a list of attributes that should produce None when they are missing instead of raising an AttributeError.
  • When you access any other attribute, like info.foo and there's no value for foo, it should raise an exception rather than silently returning None.
  • We should probably provide key-style access, like info['foo'] and info.get('foo'). In the future, we can make code that consumes this stuff work more like that.

Does that seem reasonable? Perhaps I'm being too aggressive here about eliminating all built-in attributes, but I think it is probably the right thing to do.

@dosoe
Copy link
Contributor Author

dosoe commented Apr 25, 2020

I think in principle, yes, but I'm running into trouble adapting beets/autotag/__init__.py as some tags (like title) are hardcoded, especially in apply_metadata. I'm looking into this.

@dosoe
Copy link
Contributor Author

dosoe commented Apr 25, 2020

Ok, that's gonna be a wild chase to find all the if item.attr: and replace them with if attr in item and check before calling each attribute if the attribute exists else we will get AttributeError all over the place. Maybe you could help me to find all instances, @sampsyo ? Or maybe there is a better way?

@dosoe
Copy link
Contributor Author

dosoe commented Apr 25, 2020

* We should probably provide key-style access, like `info['foo']` and `info.get('foo')`. In the future, we can make code that consumes this stuff work more like that.

This already works.

* When you access any other attribute, like `info.foo` and there's no value for `foo`, it should raise an exception rather than silently returning `None`.

That's already the case.
So far I'm working with everything being flexible, but then there's a lot to replace in the code, you might be able to help as you know the code better than I do.

@dosoe
Copy link
Contributor Author

dosoe commented Apr 27, 2020

That's a working prototype. Now I kept all the default values (all the tags that were previously set to None still are) as chasing all their occurrences and replacing every test has proven too difficult for me. However, that could be done one tag at a time now. Also, apply_metadata and apply_item_metadata from __init__.py have not be changed, that's also something that could be implemented later, however, I don't really know what it does so I don't know how to properly do it.

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great so far! It's awesome that the "apply" functions do not need to change at all (for now).

One thing to eventually clean up: it looks like the Map class is taken from a Stack Overflow answer? https://stackoverflow.com/a/32107024/39182

It might be a bit "overkill" for what we need. For example, I don't think we're using the *args part of the constructor? And similarly I don't think we're using the __dict__ part of the functionality? Maybe something much simpler like Confuse's AttrDict would suffice?
https://github.com/beetbox/confuse/blob/3bf9680e7d242136cc304232f902db06c4cc8e11/confuse.py#L1659-L1667

@dosoe
Copy link
Contributor Author

dosoe commented Apr 27, 2020

I do need some of the methods, because it needs to be hashable and I need a __setstate__ and a __getstate__ for deepcopy operations. As for the rest, I will try to clean up a bit the constructor and check which method is essential. What do you mean about the __dict__ part of the functionality? **Edit: ** I use the __dict__ functionality in the __getstate__ method.

@sampsyo
Copy link
Member

sampsyo commented Apr 27, 2020

I see… can you elaborate a little bit on which parts of the code require those two things (hashability and deep copying)?

For hashability, I think we might not want a smart approach that hashes based on the data… we might want every object to be "unequal to" every other unique object. I think that's the default for object but not for dict.

@dosoe
Copy link
Contributor Author

dosoe commented Apr 27, 2020

I remember deepcopy being a problem at my previous attempt, especially for python2. It seems like it's linked to pickle and needs a __getstate__ and a __setstate__ method, that's also in the commentaries of this StackOverflow answer: https://stackoverflow.com/questions/2352181/how-to-use-a-dot-to-access-members-of-dictionary/32107024#32107024 . I also found this error in one of the test logs of my previous attemps to implement flexible classes (#2650):

Traceback (most recent call last):

  File "/home/travis/build/beetbox/beets/test/test_autotag.py", line 436, in test_comp_track_artists_do_not_match

    self.assertNotEqual(self._dist(items, info), 0)

  File "/home/travis/build/beetbox/beets/test/test_autotag.py", line 340, in _dist

    return match.distance(items, info, self._mapping(items, info))

  File "/home/travis/build/beetbox/beets/beets/autotag/match.py", line 251, in distance

    dist.tracks[track] = track_distance(item, track, album_info.va)

TypeError: unhashable type: 'TrackInfo'

This shows the types have to be hashable.

@sampsyo
Copy link
Member

sampsyo commented Apr 27, 2020

OK—would you mind double-checking to see whether deep copying is still necessary anywhere? It might work to just delete those various methods and see if any tests fail…

That traceback is an example showing that the type does indeed need to be hashable, but this doesn't necessarily imply that the implementation should use the actual contents instead of the object identity. In fact, I think it would be incorrect to make two objects be considered equal in that context if their contents are equal. I think a default-ish implementation like id(self) would be appropriate?

@dosoe
Copy link
Contributor Author

dosoe commented Apr 27, 2020

grep deepcopy shows it only appears in some tests (namely test_ui and test_autotag) so we should be able to survive without. I will arrange the hash.

@sampsyo
Copy link
Member

sampsyo commented Apr 28, 2020

Yep, that's all it does—it only works on bytestrings (i.e., bytes on Python 3) to text strings (str on Python 3). The only question is whether there are fields that want to remain as bytes.

@dosoe
Copy link
Contributor Author

dosoe commented Apr 28, 2020

The only ones I could think of are acoustic fingerprints and similar pseudo-strings

@dosoe
Copy link
Contributor Author

dosoe commented May 4, 2020

Whatever it is, it's probably simpler to just decode all strings into unicode and add a list of exceptions rather than doing it the other way. Is there a reason to convert everything to unicode?

@sampsyo
Copy link
Member

sampsyo commented May 4, 2020

Yeah: the main reason to use the "positive" rather than "negative" approach is that, with this PR, the list of fields is meant to be extensible. So when someone comes along and adds a new field, perhaps in a plugin, we don't want to have to touch the rest of the code. We would need to choose a "default behavior," and the most sensible default is not to touch any data—to pass it along as the metadata source provided it.

@dosoe
Copy link
Contributor Author

dosoe commented May 5, 2020

But why do you actually bother to convert anything to unicode? Are regulat bytestrings not good enough? Is it for special characters (cyrillic, japanese, chinese etc.)?

@sampsyo
Copy link
Member

sampsyo commented May 5, 2020

Yes—eventually, all strings that represent text in beets must be Unicode strings. That's the only way to reliably represent the full range of characters people use in their metadata.

@dosoe
Copy link
Contributor Author

dosoe commented May 6, 2020

Then it would make sense to make the default behaviour to convert every string to unicode unless stated otherwise.

@sampsyo
Copy link
Member

sampsyo commented May 6, 2020

But that runs into the problem above: what if a plugin wants to add a field that is supposed to contain bytes? Especially if it's not a built-in beets plugin, it would have no way to instruct beets core to skip the conversion.

@dosoe
Copy link
Contributor Author

dosoe commented May 6, 2020

Why not? A field gets converted only if it exists.

@sampsyo
Copy link
Member

sampsyo commented May 6, 2020

Say I write a plugin that provides a fingerprint tag, $myfp. It holds a byte string, intentionally. The beets core has never heard of this before. But the loop you're proposing will look like this:

for field in self.data:
    if isinstance(self[field], bytes) and field not in do_not_convert_these_fields:
        self[field] = self[field].decode('utf8')

Because myfp is a brand-new field I just invented, it can't possibly be in the hard-coded do_not_convert_these_fields list. So it will get converted to text, and there's nothing I (the notional plugin developer) can do about it.

@dosoe
Copy link
Contributor Author

dosoe commented May 7, 2020

Then is there anything that holds this PR back from being merged? I'm thinking of adding a method like append(key,value): if self.key, self.key+=', '+value.decode(...) so that we don't need to make lists for all the tags that have multiple values, but that's not really a priority, I would like to first get this PR ready to be merged.

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great overall! Here's one more code review with a few low-level revisions.

@@ -138,53 +144,41 @@ def decode(self, codec='utf-8'):
if isinstance(value, bytes):
setattr(self, fld, value.decode(codec, 'ignore'))

if self.tracks:
if 'tracks' in self:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps tracks is the one think we should keep non-optional. That is, you must provide a list of track objects—unlike all the other fields, which are different because they are just metadata.

I'm actually not sure why we have this if. The loop just doesn't do anything if the list is empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, to me it doesn't make sense to have an album without tracks.

for track in self.tracks:
track.decode(codec)

def dup_albuminfo(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just call this copy (because it's obviously a method on AlbumInfo).

tracks = []
for track in self.tracks:
tracks.append(track.dup_trackinfo())
dupe.tracks = tracks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop can be replaced with a list comprehension:

dupe.tracks = [track.copy() for track in self.tracks]

@@ -224,6 +219,11 @@ def decode(self, codec='utf-8'):
if isinstance(value, bytes):
setattr(self, fld, value.decode(codec, 'ignore'))

def dup_trackinfo(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also call this copy?

trackinfo.append(TrackInfo(u'three', None))
trackinfo.append(TrackInfo(title=u'one', track_id=None))
trackinfo.append(TrackInfo(title=u'two', track_id=None))
trackinfo.append(TrackInfo(title=u'three', track_id=None))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like track_id=None may no longer be necessary?

@@ -595,7 +597,8 @@ def item(i, length):
items.append(item(12, 186.45916150485752))

def info(index, title, length):
return TrackInfo(title, None, length=length, index=index)
return TrackInfo(title=title, track_id=None, length=length,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

@@ -749,13 +752,15 @@ def test_albumtype_applied(self):
self.assertEqual(self.items[1].albumtype, 'album')

def test_album_artist_overrides_empty_track_artist(self):
my_info = copy.deepcopy(self.info)
# make a deepcopy of self.info
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is probably not necessary every time?

test/test_ui.py Outdated
i2 = library.Item()
i2.bitrate = 4321
i2.length = 10 * 60 + 54
i2.format = "F"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason why this is not i2 = self.item.copy()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this hasn't been done long ago, in a docstring of one of the parent classes it explicitly says that deepcopy() doesn't work on these objects.

@dosoe
Copy link
Contributor Author

dosoe commented May 8, 2020

Should I provide the changelog as well? Should the documentation be altered?

@sampsyo
Copy link
Member

sampsyo commented May 8, 2020

Awesome! A changelog entry would be great. I can't think of anywhere else in the docs where we mention this stuff, so probably nothing else needs to change.

Thanks for all your work on this!

beets/autotag/hooks.py Outdated Show resolved Hide resolved
@dosoe
Copy link
Contributor Author

dosoe commented May 9, 2020

That should be it then.

@sampsyo
Copy link
Member

sampsyo commented May 9, 2020

Awesome!! Thank you for your careful work on this. This is a long-standing request that will help enable lots of interesting additions in the future. Three cheers!

@sampsyo sampsyo merged commit a907dac into beetbox:master May 9, 2020
@dosoe dosoe deleted the beet_test_new_albuminfo branch May 9, 2020 14:57
@dosoe
Copy link
Contributor Author

dosoe commented May 10, 2020

I'm coming back to this PR: I tried to add new fields, which should be easy now, but I found out we missed an important part in beet/autotag/__init__.py apply_metadata. New PR incoming to sort this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants