-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quality-based Trumping during import #116
Comments
This would be superb. In my spare time I was going to try to write something like this. |
Awesome! Now that we have a central place in the importer workflow where replacement decisions are made, this should be relatively straightforward to implement -- it's just a matter of coming up with the right design for how it should work. On Wed, Apr 3, 2013 at 9:47 AM, Harrison Chapman notifications@github.com
|
This is my first working with beets, so mind how naive I am. I've written a really silly (and still crufty) plugin which tries to use eyed3 to read the Xing header of the mp3 to determine the encoding preset (e.g. v0, cbr 320) and adds a $quality name template variable. It'd be nice to know the presets because if I were quality trumping, I'd personally take v0 over a 320, but a 320 over a v2, and I do not believe the ABR is enough information to do this reliably always. I'd like to make this into a plugin, since right now it depends on eyed3, which I do not believe beets presently does. Any chance that the plugin docs are out of date, and that the database is now extensible by plugins? It would be nice to be able to I haven't given any look into modifying the replacement decisions via a plugin yet: my brief scan of the plugin docs didn't seem to include much information on this. I'll have to scan the source. |
Check out this thread on the Google Groups forum for information on storing arbitrary data in the beets library. Sampsyo indicated that such a feature (flexattrs) might be coming in beets version 1.2. |
Very cool! Nice work, and not silly at all. 🐤 🌟 I really like this approach. Regarding the dependency on eyed3: yes, we'll need to replace that with an equivalent parser for Mutagen/MediaFile. Do you know how eyed3 determines the LAME preset? Is it by parsing the "encoder" string? If so, then we can re-implement that in a fairly straightforward way. As @mister-walter mentioned, yes, we're currently in the middle of building support for arbitrary metadata fields in the database. See also #101 for the current status of this feature. I'm planning to make it the first big feature in 1.2, which will start development in earnest soon when 1.1 is out the door (very nearly ready). This is yet another good use case for that feature. |
I think eyed3 uses the xing header which is in the blank first frame of the mp3. Encoders like LAME use it, and possibly some others as well. I don't know how eyed3 uses this data to determine/guess the encoding preset! It's open source though... |
Cool. That's worth investigating to see how complex the logic involved is. |
By lifting the LameHeader class and three utility functions from eyed3, I've added super rudimentary LAME header support to mutagen in this branch (super rudimentary means that it just tries to find the LAME header, and spits it out while it opens an mp3 file, with no actual functionality). It's understandable that this code isn't in mutagen, which doesn't seem to be concerned with this extra data (especially since it may or not have an equivalent in any other type of media file), but it was interesting to see how it's certainly not impossible to add it. To test: import mutagen.mp3
mutagen.mp3.Open(MP3_FILE_NAME) |
Cool; thanks. Looks like this header is actually substantially more complex than I originally imagined—the Xing header is yet another arcane binary format. FWIW, there's discussion (years old now) on the Mutagen bug tracker for including similar functionality: There's a patch but it needs some cleaning up. Maybe the best course of action here would be to pick up that patch, which seems to have been abandoned by the original author, clean it up, add tests, and land it in Mutagen for the next release. (The Mutagen maintainers have been pretty receptive lately.) Who knew this little bit of data would be so onerous to recover? |
That explains a lot; I didn't even consider mutagen wouldn't be hosted on github, and I didn't think to look at googlecode/bitbucket. I'll give that a look over. |
The patch applies to present master branch! It works for one of my songs, but none of the test files included with mutagen at present have any preset data :(. |
I've put the functionality merged into today's master/default (whatever it's called with hg!) with testing on my Bitbucket fork of mutagen. I've also told the mutagen issues tracker. I haven't heard anything on the issue yet, though. |
Awesome, thanks for taking action! Let's see what the authors think; if necessary, we can email the Quod Libet mailing list in a few days. |
Well- it does not seem like there is any "way" for a plugin to, at present, influence the importer in checking for duplicates, or even to modify the manual duplicate resolver in the UI (e.g. make it say "we have this album as MP3 in the database, but you're importing a FLAC"). Correct me if I'm wrong! I'm going to think about how to add this broad functionality in. |
That's correct; the hook doesn't exist yet. We'll need to add it in the |
Ideally the plugin (at least one which knows the user's preference to replace MP3 with FLAC) would be UI-agnostic, and as of now resolve_duplicate just throws a I'm not used to thinking this much about software infrastructure. It's fun! Please let me know if the issues thread is not an appropriate place for my musing... |
Yes, absolutely—you're right that invoking the plugin before the call to And yes, absolutely, this issue tracker is the perfect place for this kind of discussion! To me, this is more organized and therefore useful than the mailing list or ad-hoc emails... there's some hope of coming back to the discussion later when we need to remember the decisions we made. |
Moral conundrum: I'm not used to parallel programming so... It'd be nice to add task duplicate data into Importer task, so that plugins could add or remove duplicates automatically by some score calculation... It's easy to me to just carry along duplicate data as, say, a list, rather than calling _check_duplicates twice. But this crumbles under the completely possible scenario where a user has e.g. FLAC and MP3 in the import query for the same album, and the user would like the FLAC to auto trump the MP3. But it's not clear to me that we can just reliably hold on to any ad-hoc album data added to recents in user_query and expect it to still be correct by the time the better quality comes through... I originally hoped this wouldn't need any sweeping infrastructure changes. Just saying where I'm stumped, and how the coroutine solution is pretty brilliant and still magic to me.. |
Yes, the One simplification that may help: all tasks hit the database in order. That means that, in the So a plugin could prompt the user (or not) based on the limited data available in |
Here's a branch with a preliminary solution. I've moved everything dealing with duplicates (including the user query...) to the This beetsplug uses it and it also uses my branch of mutagen (about which I've still heard nothing, although I've still only just posted on the googlecode issue) to get preset data. It's a rudimentary quality trumper which prefers FLAC > V0 > 320 > [etc...] |
I've updated my branch so that all tests pass, but I haven't written any new tests. Curious question: there was an issue in |
It just hit me why moving a user query for the duplicate check is not a good solution. I'm also trying to work in a score system for duplicate replacement calculation instead of a binary take-it-or-leave-it |
Great; I'm glad that makes sense. Yes, all communication with the user has to be done in the same pipeline stage. |
I've changed my strategy to a "trump score" system paralleling the distance that's in now for MB matching. Progress on this branch is preliminary, but works for a really rough manual test I've run myself. Automated tests don't fail, but there are no new ones to test my code. |
Interesting approach! I like the change of And the scoring system is interesting too, but I can also see a more specific interface (which might ask "given these albums, which is the best?" instead of "what's the score for this album?") being somewhat simpler. It would push some of the intelligence into the plugins rather than burdening the import flow more, which is already frighteningly complex. Does that make sense? On Apr 24, 2013, at 1:33 PM, Harrison Chapman notifications@github.com wrote:
|
The primary reason behind the score would be an (immutable, i.e. not changing after the tracks are imported into the library) number which wouldn't require holding on to any further data of items being imported. Perhaps plugins would be able to declare which properties about recent imports to hold on to? This doesn't really parallel anything in beets that I can think of at present however... Comparing relatively would involve holding on to some of this recent import information specifically, and since the changes haven't been applied yet, I'm not sure how messy this could get. An example reason for this would be quality trumping-- right now quality trumping involves having the mutagen file info somewhere and I don't know for sure if the actual import/moving into library could cause some sort of rare race conditions when trying to grab/parse this information from tasks long finished. Certainly with a score system the idea would be to present the user with a choice if the new import score doesn't exceed some threshold (which would ideally only happen if the new import completely eclipses an old; better quality, all the same tracks, same length, same release year...), perhaps with some delta information to explain to the user why we're scoring duplicate choices as such. |
Scoring seems just fine as a general approach to this problem; I'm just questioning whether the logic that handles scores should be done in importer.py or in qualitytrump.py. If at all possible, it would be great to move as much as possible (e.g., finding the max, totaling up track_scores and album_scores, etc.) to plugins to simplify the importer itself as much as possible. Of course, it's possible there's no clean way to do that. To avoid races, one approach you could take in the plugin is to calculate scores when import tasks are first created (handling the import_task_start event) so you have all scores available for later comparisons. You can keep an internal map from tasks to scores or just assign your scores right on the task objects (thanks, Python!). |
Trump attachment processing is a bit less open-ended by adding the semantic framework of #111. In my original ticket for Trumping on gcode, I mentioned image handling and cue and log files. In that vein, I just added some more specific ideas to the attachment ticket (gcode 109). In the name of redundancy prevention, please see https://code.google.com/p/beets/issues/detail?id=109#c21 Trump processing needs to consider the attachments of each candidate album and item as they are moved. Images ought to be trumpable as well, with higher-res images replacing lower-res versions of the same image. I pointed at a resolution and scale-independent image comparison library which ought to help a lot in determining "sameness" although I can't vouch for how well it works in real life. Cue and log files are interesting in that they are only valid if an album consists of the complete set of items they refer to. Any item or album trumping means the cue and log have to go, even if the new version doesn't have replacements. |
I also have no idea what to do with assets which have been trumped. I suppose the options are "delete" (which I find terrifying) and "mark and move", but to where? A "discard" path? |
beets already seems to be content skipping/removing albums at user query, so perhaps the trumping could (possibly unless the user informs otherwise) provide a summary of/query for the destructive actions it will take at the end of an import? I am not entirely clear if this fits into the structure/morals of the project however. |
Ping, how come this patch is collecting dust? :-) I need something like this before I can attempt to actually put beets to use... |
There are a few outstanding issues awaiting a champion:
Please feel free to chip in on any of the steps. |
This issue (the original one on googlecode) resume exactly what I want. I'll track it. |
This would be a very useful feature to have! |
Hi, This would be very nice, I'm doing this manually with 'duplicates' plugin using "beet dup -F -f '$path $bitrate'"... and then a manual rm... very simple, but painful It would be useful to have the list of duplicates any time an import finds them. I mean, program will find a file already in library and have the newcomer. What if the program just detail the tags, that are different, of the two of them, this way I could choose wisely. This could be done the same way the program tells whenever its correcting tags. Ie: /foo/bar/Megadeth/Rust in Peace/06 Lucretia.mp3
I'm considering that I might want to have lots of versions of any file. With that information, one can easily decide what to do... also, I guess that adding this would be easier, just to wait for the other options. I don't know how to program in python to add this functionality myself :( I might learn, fork and contribute :) Just my 2 cents Thanks for this great program... |
I'm also interested in having V0 V2 320kb available to use in paths |
Would it be possible to specify which identifiers are used to detect duplicates? |
@aristidesfl Sounds like a reasonable extension, yes. |
push :) |
mutagen trunk (next version) will now expose lame version and bitrate mode. Feedback welcome. |
Fantastic! Awesome work. I'll look into exposing this in beets. |
Fantastic idea this addition to have this at import! "Would it be possible to specify which identifiers are used to detect duplicates? +1 on that!! And also, at import the option to set default actions would be very helpful, like always take new. Ofcourse, it this gets implemented to its full extent with a scoring system based on bitrate, availability of log/cue,... would be even better than a basic new/old option. |
Sorry to complicate this, but I have something to add: Joint stereo vs Separate stereo. |
Why? |
The mid-side stereo transform is lossless, and reduces redundant encoding of common information between channels. |
I notice a difference.
|
@Joshfindit did you double blind test / abx? |
Yes, many times over the years. For those that are skeptical, I'll put these on the table:
|
I'm experimenting with beets for the first time and noticed that when I import a directory containing both ogg vorbis and MP3 files it imports only the MP3 files (exactly the opposite of what I want). |
This issue was automatically migrated from Google Code.
Original author: telejes...@gmail.com (February 16, 2012 01:16:19)
Original issue: google-code-export/beets#341
The text was updated successfully, but these errors were encountered: