Implement album merging for duplicates #2725

udiboy1209 · 2017-10-31T20:36:21Z

Fixes #112

Would removing the album before running the merge task make sense? This code seems to work now
but I think it gives problems when reimporting library with existing duplicate albums. The tests don't fail though.

Fixes beetbox#112

sampsyo

This is incredibly cool! As you can tell from the small issue number, this is an extremely long-standing request, and this is a nice, elegant approach.

I actually like the current strategy you have, which resembles a "re-import." Removing the items from the library first shouldn't be necessary.

I made a few small comments and asked one question. Mainly, some helper functions might be helpful, and some additional commenting to make the intent clear could help future readers of the code.

Would you also mind adding to the importer guide in the documentation to describe what this option does?

sampsyo · 2017-11-02T22:27:25Z

beets/importer.py

@@ -1352,7 +1358,26 @@ def emitter(task):
        ])
        return pipeline.multiple(ipl.pull())

-    resolve_duplicates(session, task)
+    if type(task) != MergedImportTask:


I know this seems crazy, but just to check: is this condition actually necessary? That is, if we merge a bunch of albums into one, would it be OK to check again for duplicates of the newly-tagged merged album?

I did this because the initially detected duplicates weren't removed, hence would again be marked in this MergedImportTask's duplicate list and give a second prompt to the user. Choosing "Remove" at this stage would be the correct way to go, which is basically what this condition does automatically. Also it helps avoid another pointless check for duplicates which we know exist.

Is there a way to remove those albums from the library, so that they don't get detected as duplicates again? like probably setting the album's id to None ?

Aha, got it. This makes me think that there might be a slightly deeper issue to address here: we probably shouldn't detect a duplicate when the "duplicate" album actually consists of the same songs as the newly-imported album! This should probably be true for any re-import situation, not just the new functionality you've added here.

In particular consider this check from find_duplicates:

beets/beets/importer.py

Lines 637 to 639 in 9c6910d

album_paths = set(i.path for i in album.items())

if album_paths != task_paths:

duplicates.append(album)

This says that we don't detect a duplicate if the set of songs is exactly the same between the old and new album. Perhaps this should be changed to a subset check instead. Does that seem reasonable?

A subset check would be correct. I will make that change because the subset case would only occur in merge scenario. We also wouldn't need a separate MergeImportTask class and check too.

sampsyo · 2017-11-02T22:28:34Z

beets/importer.py

+            duplicate_items = task.duplicate_items(session.lib)
+            for item in duplicate_items:
+                item.id = None
+                item.album_id = None


This looks like the same rigamarole we have to do when re-importing (which is logical). Instead of duplicating the code to do this, maybe we need a helper function for this.

sampsyo · 2017-11-02T22:29:45Z

beets/importer.py

+                iter([merged_task]),
+                lookup_candidates(session),
+                user_query(session)
+            ])


Similarly, maybe a helper function would be useful here to encapsulate the temporary pipeline as used here and in user_query.

also implement repeating code as helper functions

udiboy1209 · 2017-11-03T20:12:39Z

The problem now with re-importing duplicate albums from the library is that multiple sessions are created for each album, but if the user decides to merge the two in the first session, the second session would run into an error because the files would probably have moved. A similar error should occur when the user decides to remove the duplicates in the first session. What is done in that case?

sampsyo · 2017-11-03T21:57:46Z

Hmm, I don't quite understand… what should I do to trigger a problem? For example, is it enough just to choose to merge and then "apply changes" in the temporary sub-pipeline that's created, or is there something more exotic that I need to do?

udiboy1209 · 2017-11-04T10:59:50Z

I'll explain better. There are two albums in the library , A and B which are duplicates (imported using the Keep both option)

Now when I try re-importing these two albums, two import sessions would be created for each album A and B. In the first session, when B is marked as duplicate, if I choose to Merge, tracks of B are re-imported with A into one album.

This works well and is the intended result, but now the second import session which was working on the original album B would fail. Same would happen if I choose to Remove the duplicates of A, and files of B are removed.

wisp3rwind · 2017-11-04T11:17:25Z

(Disclaimer: I have read neither this PR entirely nor looked at the relevant code in detail)

Maybe, if you could detect this situation from the task that is processed second, you could make the user_query stage yield a beets.pipeline.BUBBLE (which means to abort the task and not execute any further pipeline stage) early (before showing any prompt). That would work since stages of the same type (e.g. user_query) are run strictly in sequence, so when the second one is run, the decision about merging the first has been made). One remaining point of failure that would need investigation might be any access to old database items that have been changed by the first task.

sampsyo · 2017-11-04T18:07:33Z

Ah! I see; thanks for explaining, @udiboy1209, and for adding some thoughts, @wordofglass. In summary, the problem only happens when re-imports collide with merges (which is not an exotic scenario). When we merge with an old album, we don't also want to import the old album separately.

In a way, this is similar to the problem we used to have with recent duplicates. If you imported album X and then another copy of album X again immediately afterward, the first copy might not have reached the database yet, so no duplicate would be detected. We used to solve this with an extra data structure on the side to keep track of "recently imported" albums, but we've since replaced that with a strategy that just adds albums to the database earlier—before the files are copied and the metadata is updated.

@wordofglass's suggestion amounts to bringing back approximately the same strategy. The user_query stage can keep track of in-library albums that have been merged already and emit a bubble if they're encountered later on. Sounds perfect to me!

sampsyo · 2017-11-09T21:25:39Z

Yay! This looks perfect from here. Are there any outstanding concerns, or shall we hit the merge button?

udiboy1209 · 2017-11-10T16:12:38Z

Great! Merge is good to go for me.

Implement album merging for duplicates

sampsyo · 2017-11-11T16:08:33Z

Awesome! Thank you again for tackling this—it's been a very long-standing issue that many people have requested, and this is a very elegant solution. I hereby award you the Pull Request of the Month award. 🥇

sean-abbott · 2017-11-13T14:52:34Z

Awesome, thank you so much! Just in time for me to merge my library with my partner's. :-)

udiboy1209 added 2 commits November 1, 2017 02:00

Implement album merging for duplicates

70f0bc5

Fixes beetbox#112

Fix flake8 errors

28fef2e

sampsyo requested changes Nov 2, 2017

View reviewed changes

Change to subset check for finding duplicates

4d4fb50

also implement repeating code as helper functions

Add comments and explain merge feature in config docs and guide

1646da4

udiboy1209 added 2 commits November 5, 2017 06:20

Mark merged items in session for future tasks

83f2e44

flake8 fixes

a6215f3

sampsyo approved these changes Nov 9, 2017

View reviewed changes

sampsyo merged commit a6215f3 into beetbox:master Nov 11, 2017

sampsyo added a commit that referenced this pull request Nov 11, 2017

Merge pull request #2725 from udiboy1209/merge-albums

c5ed992

Implement album merging for duplicates

sampsyo added a commit that referenced this pull request Nov 11, 2017

Changelog for #2725

0c20147

DArtagan mentioned this pull request Feb 9, 2018

Merge in importer. #1535

Closed

udiboy1209 deleted the merge-albums branch March 8, 2018 15:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement album merging for duplicates #2725

Implement album merging for duplicates #2725

udiboy1209 commented Oct 31, 2017

sampsyo left a comment

sampsyo Nov 2, 2017

udiboy1209 Nov 3, 2017

sampsyo Nov 3, 2017

udiboy1209 Nov 3, 2017

sampsyo Nov 2, 2017

sampsyo Nov 2, 2017

udiboy1209 commented Nov 3, 2017 •

edited

Loading

sampsyo commented Nov 3, 2017

udiboy1209 commented Nov 4, 2017

wisp3rwind commented Nov 4, 2017

sampsyo commented Nov 4, 2017

sampsyo commented Nov 9, 2017

udiboy1209 commented Nov 10, 2017

sampsyo commented Nov 11, 2017

sean-abbott commented Nov 13, 2017

	album_paths = set(i.path for i in album.items())
	if album_paths != task_paths:
	duplicates.append(album)

Implement album merging for duplicates #2725

Implement album merging for duplicates #2725

Conversation

udiboy1209 commented Oct 31, 2017

sampsyo left a comment

Choose a reason for hiding this comment

sampsyo Nov 2, 2017

Choose a reason for hiding this comment

udiboy1209 Nov 3, 2017

Choose a reason for hiding this comment

sampsyo Nov 3, 2017

Choose a reason for hiding this comment

udiboy1209 Nov 3, 2017

Choose a reason for hiding this comment

sampsyo Nov 2, 2017

Choose a reason for hiding this comment

sampsyo Nov 2, 2017

Choose a reason for hiding this comment

udiboy1209 commented Nov 3, 2017 • edited Loading

sampsyo commented Nov 3, 2017

udiboy1209 commented Nov 4, 2017

wisp3rwind commented Nov 4, 2017

sampsyo commented Nov 4, 2017

sampsyo commented Nov 9, 2017

udiboy1209 commented Nov 10, 2017

sampsyo commented Nov 11, 2017

sean-abbott commented Nov 13, 2017

udiboy1209 commented Nov 3, 2017 •

edited

Loading