-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement album merging for duplicates #2725
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incredibly cool! As you can tell from the small issue number, this is an extremely long-standing request, and this is a nice, elegant approach.
I actually like the current strategy you have, which resembles a "re-import." Removing the items from the library first shouldn't be necessary.
I made a few small comments and asked one question. Mainly, some helper functions might be helpful, and some additional commenting to make the intent clear could help future readers of the code.
Would you also mind adding to the importer guide in the documentation to describe what this option does?
beets/importer.py
Outdated
@@ -1352,7 +1358,26 @@ def emitter(task): | |||
]) | |||
return pipeline.multiple(ipl.pull()) | |||
|
|||
resolve_duplicates(session, task) | |||
if type(task) != MergedImportTask: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this seems crazy, but just to check: is this condition actually necessary? That is, if we merge a bunch of albums into one, would it be OK to check again for duplicates of the newly-tagged merged album?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this because the initially detected duplicates weren't removed, hence would again be marked in this MergedImportTask's duplicate list and give a second prompt to the user. Choosing "Remove" at this stage would be the correct way to go, which is basically what this condition does automatically. Also it helps avoid another pointless check for duplicates which we know exist.
Is there a way to remove those albums from the library, so that they don't get detected as duplicates again? like probably setting the album's id to None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, got it. This makes me think that there might be a slightly deeper issue to address here: we probably shouldn't detect a duplicate when the "duplicate" album actually consists of the same songs as the newly-imported album! This should probably be true for any re-import situation, not just the new functionality you've added here.
In particular consider this check from find_duplicates
:
Lines 637 to 639 in 9c6910d
album_paths = set(i.path for i in album.items()) | |
if album_paths != task_paths: | |
duplicates.append(album) |
This says that we don't detect a duplicate if the set of songs is exactly the same between the old and new album. Perhaps this should be changed to a subset check instead. Does that seem reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A subset check would be correct. I will make that change because the subset case would only occur in merge scenario. We also wouldn't need a separate MergeImportTask class and check too.
beets/importer.py
Outdated
duplicate_items = task.duplicate_items(session.lib) | ||
for item in duplicate_items: | ||
item.id = None | ||
item.album_id = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the same rigamarole we have to do when re-importing (which is logical). Instead of duplicating the code to do this, maybe we need a helper function for this.
beets/importer.py
Outdated
iter([merged_task]), | ||
lookup_candidates(session), | ||
user_query(session) | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, maybe a helper function would be useful here to encapsulate the temporary pipeline as used here and in user_query
.
also implement repeating code as helper functions
The problem now with re-importing duplicate albums from the library is that multiple sessions are created for each album, but if the user decides to merge the two in the first session, the second session would run into an error because the files would probably have moved. A similar error should occur when the user decides to remove the duplicates in the first session. What is done in that case? |
Hmm, I don't quite understand… what should I do to trigger a problem? For example, is it enough just to choose to merge and then "apply changes" in the temporary sub-pipeline that's created, or is there something more exotic that I need to do? |
I'll explain better. There are two albums in the library , A and B which are duplicates (imported using the Keep both option) Now when I try re-importing these two albums, two import sessions would be created for each album A and B. In the first session, when B is marked as duplicate, if I choose to Merge, tracks of B are re-imported with A into one album. This works well and is the intended result, but now the second import session which was working on the original album B would fail. Same would happen if I choose to Remove the duplicates of A, and files of B are removed. |
(Disclaimer: I have read neither this PR entirely nor looked at the relevant code in detail) Maybe, if you could detect this situation from the task that is processed second, you could make the |
Ah! I see; thanks for explaining, @udiboy1209, and for adding some thoughts, @wordofglass. In summary, the problem only happens when re-imports collide with merges (which is not an exotic scenario). When we merge with an old album, we don't also want to import the old album separately. In a way, this is similar to the problem we used to have with recent duplicates. If you imported album X and then another copy of album X again immediately afterward, the first copy might not have reached the database yet, so no duplicate would be detected. We used to solve this with an extra data structure on the side to keep track of "recently imported" albums, but we've since replaced that with a strategy that just adds albums to the database earlier—before the files are copied and the metadata is updated. @wordofglass's suggestion amounts to bringing back approximately the same strategy. The |
Yay! This looks perfect from here. Are there any outstanding concerns, or shall we hit the merge button? |
Great! Merge is good to go for me. |
Implement album merging for duplicates
Awesome! Thank you again for tackling this—it's been a very long-standing issue that many people have requested, and this is a very elegant solution. I hereby award you the Pull Request of the Month award. 🥇 |
Awesome, thank you so much! Just in time for me to merge my library with my partner's. :-) |
Fixes #112
Would removing the album before running the merge task make sense? This code seems to work now
but I think it gives problems when reimporting library with existing duplicate albums. The tests don't fail though.