Enable Merging of BibDatabases #6689

DominikVoigt · 2020-07-15T16:35:24Z

This PR adds a method to the BibDatabases that allows instances to be merged with other instances.
This merging will not introduce duplicates if an entry is contained in both databases.

Change in CHANGELOG.md described (if applicable)
Tests created for changes (if applicable)
Manually tested changed features in running JabRef (always required)
Screenshots added in PR description (for UI changes)
Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.

tobiasdiez

Good that you have a look at the merge functionality as well.

We already have two merge methods:

jabref/src/main/java/org/jabref/gui/importer/ImportEntriesViewModel.java

Lines 106 to 195 in 7cc5747

    
           public void importEntries(List<BibEntry> entriesToImport, boolean downloadFiles) { 
        
               // Check if we are supposed to warn about duplicates. 
        
               // If so, then see if there are duplicates, and warn if yes. 
        
               if (preferences.shouldWarnAboutDuplicatesForImport()) { 
        
                   BackgroundTask.wrap(() -> entriesToImport.stream() 
        
                                                            .anyMatch(this::hasDuplicate)).onSuccess(duplicateFound -> { 
        
                       if (duplicateFound) { 
        
                           boolean continueImport = dialogService.showConfirmationDialogWithOptOutAndWait(Localization.lang("Duplicates found"), 
        
                                   Localization.lang("There are possible duplicates (marked with an icon) that haven't been resolved. Continue?"), 
        
                                   Localization.lang("Continue with import"), 
        
                                   Localization.lang("Cancel import"), 
        
                                   Localization.lang("Disable this confirmation dialog"), 
        
                                   optOut -> preferences.setShouldWarnAboutDuplicatesForImport(!optOut)); 
        
                           if (!continueImport) { 
        
                               dialogService.notify(Localization.lang("Import canceled")); 
        
                           } else { 
        
                               buildImportHandlerThenImportEntries(entriesToImport); 
        
                           } 
        
                       } else { 
        
                           buildImportHandlerThenImportEntries(entriesToImport); 
        
                       } 
        
                   }).executeWith(Globals.TASK_EXECUTOR); 
        
               } else { 
        
                   buildImportHandlerThenImportEntries(entriesToImport); 
        
               } 
        
               if (downloadFiles) { 
        
                   for (BibEntry bibEntry : entriesToImport) { 
        
                       for (LinkedFile linkedFile : bibEntry.getFiles()) { 
        
                           LinkedFileViewModel linkedFileViewModel = new LinkedFileViewModel(linkedFile, bibEntry, databaseContext, taskExecutor, dialogService, preferences.getXMPPreferences(), preferences.getFilePreferences(), ExternalFileTypes.getInstance()); 
        
                           linkedFileViewModel.download(); 
        
                       } 
        
                   } 
        
               } 
        
               NamedCompound namedCompound = new NamedCompound(Localization.lang("Import file")); 
        
               namedCompound.addEdit(new UndoableInsertEntries(databaseContext.getDatabase(), entriesToImport)); 
        
               // merge strings into target database 
        
               for (BibtexString bibtexString : parserResult.getDatabase().getStringValues()) { 
        
                   String bibtexStringName = bibtexString.getName(); 
        
                   if (databaseContext.getDatabase().hasStringByName(bibtexStringName)) { 
        
                       String importedContent = bibtexString.getContent(); 
        
                       String existingContent = databaseContext.getDatabase().getStringByName(bibtexStringName).get().getContent(); 
        
                       if (!importedContent.equals(existingContent)) { 
        
                           LOGGER.warn("String contents differ for {}: {} != {}", bibtexStringName, importedContent, existingContent); 
        
                           // TODO: decide what to do here (in case the same string exits) 
        
                       } 
        
                   } else { 
        
                       databaseContext.getDatabase().addString(bibtexString); 
        
                       // FIXME: this prevents this method to be moved to logic - we need to implement a new undo/redo data model 
        
                       namedCompound.addEdit(new UndoableInsertString(databaseContext.getDatabase(), bibtexString)); 
        
                   } 
        
               } 
        
               // copy content selectors to target database 
        
               MetaData targetMetada = databaseContext.getMetaData(); 
        
               parserResult.getMetaData() 
        
                           .getContentSelectorList() 
        
                           .forEach(contentSelector -> targetMetada.addContentSelector(contentSelector)); 
        
               // TODO undo of content selectors (currently not implemented) 
        
               // copy groups to target database 
        
               parserResult.getMetaData().getGroups().ifPresent( 
        
                       newGroupsTreeNode -> { 
        
                           if (targetMetada.getGroups().isPresent()) { 
        
                               GroupTreeNode groupTreeNode = targetMetada.getGroups().get(); 
        
                               newGroupsTreeNode.moveTo(groupTreeNode); 
        
                               namedCompound.addEdit( 
        
                                       new UndoableAddOrRemoveGroup( 
        
                                               new GroupTreeNodeViewModel(groupTreeNode), 
        
                                               new GroupTreeNodeViewModel(newGroupsTreeNode), 
        
                                               UndoableAddOrRemoveGroup.ADD_NODE)); 
        
                           } else { 
        
                               // target does not contain any groups, so we can just use the new groups 
        
                               targetMetada.setGroups(newGroupsTreeNode); 
        
                               namedCompound.addEdit( 
        
                                       new UndoableAddOrRemoveGroup( 
        
                                               new GroupTreeNodeViewModel(newGroupsTreeNode), 
        
                                               new GroupTreeNodeViewModel(newGroupsTreeNode), 
        
                                               UndoableAddOrRemoveGroup.ADD_NODE)); 
        
                           } 
        
                       } 
        
               ); 
        
               namedCompound.end(); 
        
               Globals.undoManager.addEdit(namedCompound); 
        
               JabRefGUI.getMainFrame().getCurrentBasePanel().markBaseChanged(); 
        
           }

jabref/src/main/java/org/jabref/gui/importer/ImportAction.java

Lines 129 to 189 in 7cc5747

    
           private ParserResult mergeImportResults(List<ImportFormatReader.UnknownFormatImport> imports) { 
        
               BibDatabase resultDatabase = new BibDatabase(); 
        
               ParserResult result = new ParserResult(resultDatabase); 
        
               for (ImportFormatReader.UnknownFormatImport importResult : imports) { 
        
                   if (importResult == null) { 
        
                       continue; 
        
                   } 
        
                   ParserResult parserResult = importResult.parserResult; 
        
                   List<BibEntry> entries = parserResult.getDatabase().getEntries(); 
        
                   resultDatabase.insertEntries(entries); 
        
                   if (ImportFormatReader.BIBTEX_FORMAT.equals(importResult.format)) { 
        
                       // additional treatment of BibTeX 
        
                       // merge into existing database 
        
                       // Merge strings 
        
                       for (BibtexString bibtexString : parserResult.getDatabase().getStringValues()) { 
        
                           String bibtexStringName = bibtexString.getName(); 
        
                           if (resultDatabase.hasStringByName(bibtexStringName)) { 
        
                               String importedContent = bibtexString.getContent(); 
        
                               String existingContent = resultDatabase.getStringByName(bibtexStringName).get().getContent(); 
        
                               if (!importedContent.equals(existingContent)) { 
        
                                   LOGGER.warn("String contents differ for {}: {} != {}", bibtexStringName, importedContent, existingContent); 
        
                                   // TODO: decide what to do here (in case the same string exits) 
        
                               } 
        
                           } else { 
        
                               resultDatabase.addString(bibtexString); 
        
                           } 
        
                       } 
        
                       // Merge groups 
        
                       // Adds the specified node as a child of the current root. The group contained in <b>newGroups </b> must not be of 
        
                       // type AllEntriesGroup, since every tree has exactly one AllEntriesGroup (its root). The <b>newGroups </b> are 
        
                       // inserted directly, i.e. they are not deepCopy()'d. 
        
                       parserResult.getMetaData().getGroups().ifPresent(newGroups -> { 
        
                           // ensure that there is always only one AllEntriesGroup in the resulting database 
        
                           // "Rename" the AllEntriesGroup of the imported database to "Imported" 
        
                           if (newGroups.getGroup() instanceof AllEntriesGroup) { 
        
                               // create a dummy group 
        
                               try { 
        
                                   // This will cause a bug if the group already exists 
        
                                   // There will be group where the two groups are merged 
        
                                   String newGroupName = importResult.parserResult.getFile().map(File::getName).orElse("unknown"); 
        
                                   ExplicitGroup group = new ExplicitGroup("Imported " + newGroupName, GroupHierarchyType.INDEPENDENT, 
        
                                           Globals.prefs.getKeywordDelimiter()); 
        
                                   newGroups.setGroup(group); 
        
                                   group.add(parserResult.getDatabase().getEntries()); 
        
                               } catch (IllegalArgumentException e) { 
        
                                   LOGGER.error("Problem appending entries to group", e); 
        
                               } 
        
                           } 
        
                           result.getMetaData().getGroups().ifPresent(newGroups::moveTo); 
        
                       }); 
        
                       for (ContentSelector selector : parserResult.getMetaData().getContentSelectorList()) { 
        
                           result.getMetaData().addContentSelector(selector); 
        
                       } 
        
                   } 
        
                   // TODO: collect errors into ParserResult, because they are currently ignored (see caller of this method) 
        
               }

It would be good to refactor and combine them (with your newly added method as well).

Side remark: depending on your envisioned applications, the ImportEntriesDialog might be helpful.

koppor · 2020-08-01T16:28:17Z

The linked code refs #6488

Currently, I understand the new code better than the looong linked code.

We also have BibDatabaseDiff, which doesn't seem to be referenced in any code. (Example use: https://github.com/koppor/jabref/pull/442/files#diff-58f2d74f80b59a9ba6468243b879c428R68)

koppor · 2020-08-05T21:08:59Z

We merged the merge methods at the cose of "some" undo/redo (at the whole import 😟)

undo/redo refs JabRef#453

Signed-off-by: Dominik Voigt <dominik.ingo.voigt@gmail.com>

koppor

Commit issues resolved. LGTM.

tobiasdiez

Thanks. The code looks mostly good to me. I've a few comments and suggestions for improvement.

src/main/java/org/jabref/gui/importer/ImportAction.java

tobiasdiez · 2020-08-28T21:28:33Z

src/main/java/org/jabref/logic/bibtex/BibDatabaseMerger.java

@@ -0,0 +1,65 @@
+package org.jabref.logic.bibtex;


Since this is not related to writing bibtex (as the rest of the package), I suggest to move it together with the duplication check to a new logic.database package in parallel to the existing model.database. @JabRef/developers opinions?

I moved it to the recommended package :).

src/main/java/org/jabref/logic/bibtex/BibDatabaseMerger.java

tobiasdiez · 2020-08-28T21:36:29Z

src/test/java/org/jabref/logic/bibtex/BibDatabaseMergerTest.java

+    void mergeAddsNonDuplicateEntries() {
+        // Entries 2 and 3 are identical
+        BibEntry entry1 = new BibEntry()
+                .withField(StandardField.AUTHOR, "Stephen Blaha")


Please reduce these examples to a minimum. I don't think you need to have 4 entries with full information

I reduced the number of entries and their information content.

tobiasdiez · 2020-08-28T21:37:04Z

src/main/java/org/jabref/model/metadata/MetaData.java

+     * @param otherFilename   the filename of the other library. Pass "unknown" if not known.
+     * @param allOtherEntries list of all other entries
+     */
+    public void merge(MetaData other, String otherFilename, List<BibEntry> allOtherEntries) {


Please move these merge metadata methods to the database merger class. There you can also add a merge method operating on BibDatabaseContext objects, to have an easy way to merge two databases including all their metadata.

tobiasdiez · 2020-08-28T21:41:16Z

src/test/java/org/jabref/logic/bibtex/BibDatabaseMergerTest.java

+                                                 .collect(Collectors.toList());
+
+        assertEquals(List.of(targetString1.toString(), targetString2.toString()), resultStringsSorted);
+    }


Please also include tests for the merged groups.

Signed-off-by: Dominik Voigt <dominik.ingo.voigt@gmail.com>

Add DatabaseContext merging capability to DatabaseMerger Signed-off-by: Dominik Voigt <dominik.ingo.voigt@gmail.com>

Signed-off-by: Dominik Voigt <dominik.ingo.voigt@gmail.com>

DominikVoigt · 2020-08-29T14:35:42Z

Thanks for your comments :). I implemented the requested changes!

* upstream/master: Enable Merging of BibDatabases (#6689) Refactor file preferences (#6779) Interrupt all running tasks during shutdown (#6118) Fixes #6705 , added icon for multiple identifiers (#6809) Apply css files correctly to dialogs (#6828) Fix link Make template more explicit

tobiasdiez requested changes Jul 16, 2020

View reviewed changes

DominikVoigt force-pushed the feature/enable-merging-of-bibdatabases branch from 8523018 to 278a5f7 Compare July 19, 2020 21:47

koppor marked this pull request as draft July 27, 2020 22:17

DominikVoigt marked this pull request as ready for review August 25, 2020 08:28

DominikVoigt force-pushed the feature/enable-merging-of-bibdatabases branch from 7ca130e to 760498b Compare August 28, 2020 08:47

Clean up commit and remove unwanted changes.

000d773

Signed-off-by: Dominik Voigt <dominik.ingo.voigt@gmail.com>

DominikVoigt force-pushed the feature/enable-merging-of-bibdatabases branch from 760498b to 000d773 Compare August 28, 2020 08:49

koppor added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Aug 28, 2020

koppor approved these changes Aug 28, 2020

View reviewed changes

tobiasdiez requested changes Aug 28, 2020

View reviewed changes

tobiasdiez added the status: changes required Pull requests that are not yet complete label Aug 28, 2020

DominikVoigt added 5 commits August 29, 2020 14:02

Move Merger and DuplicateCheck into database package

dda5e45

Signed-off-by: Dominik Voigt <dominik.ingo.voigt@gmail.com>

Move meta data merging into DatabaseMerger

6cc4e77

Add DatabaseContext merging capability to DatabaseMerger Signed-off-by: Dominik Voigt <dominik.ingo.voigt@gmail.com>

Add one meta data merge test (unfinished)

fe9d46c

Signed-off-by: Dominik Voigt <dominik.ingo.voigt@gmail.com>

Add meta data merging tests.

70c8dba

Signed-off-by: Dominik Voigt <dominik.ingo.voigt@gmail.com>

Reduce test example

a247d85

Signed-off-by: Dominik Voigt <dominik.ingo.voigt@gmail.com>

koppor removed the status: changes required Pull requests that are not yet complete label Aug 31, 2020

tobiasdiez approved these changes Sep 1, 2020

View reviewed changes

tobiasdiez merged commit 35f5078 into JabRef:master Sep 1, 2020

DominikVoigt deleted the feature/enable-merging-of-bibdatabases branch January 1, 2021 16:06

DominikVoigt restored the feature/enable-merging-of-bibdatabases branch January 1, 2021 16:06

DominikVoigt deleted the feature/enable-merging-of-bibdatabases branch January 1, 2021 16:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Merging of BibDatabases #6689

Enable Merging of BibDatabases #6689

DominikVoigt commented Jul 15, 2020

tobiasdiez left a comment

koppor commented Aug 1, 2020

koppor commented Aug 5, 2020

koppor left a comment

tobiasdiez left a comment

tobiasdiez Aug 28, 2020

DominikVoigt Aug 29, 2020

tobiasdiez Aug 28, 2020

DominikVoigt Aug 29, 2020

tobiasdiez Aug 28, 2020

tobiasdiez Aug 28, 2020

DominikVoigt commented Aug 29, 2020

	public void importEntries(List<BibEntry> entriesToImport, boolean downloadFiles) {
	// Check if we are supposed to warn about duplicates.
	// If so, then see if there are duplicates, and warn if yes.
	if (preferences.shouldWarnAboutDuplicatesForImport()) {
	BackgroundTask.wrap(() -> entriesToImport.stream()
	.anyMatch(this::hasDuplicate)).onSuccess(duplicateFound -> {
	if (duplicateFound) {
	boolean continueImport = dialogService.showConfirmationDialogWithOptOutAndWait(Localization.lang("Duplicates found"),
	Localization.lang("There are possible duplicates (marked with an icon) that haven't been resolved. Continue?"),
	Localization.lang("Continue with import"),
	Localization.lang("Cancel import"),
	Localization.lang("Disable this confirmation dialog"),
	optOut -> preferences.setShouldWarnAboutDuplicatesForImport(!optOut));

	if (!continueImport) {
	dialogService.notify(Localization.lang("Import canceled"));
	} else {
	buildImportHandlerThenImportEntries(entriesToImport);
	}
	} else {
	buildImportHandlerThenImportEntries(entriesToImport);
	}
	}).executeWith(Globals.TASK_EXECUTOR);
	} else {
	buildImportHandlerThenImportEntries(entriesToImport);
	}

	if (downloadFiles) {
	for (BibEntry bibEntry : entriesToImport) {
	for (LinkedFile linkedFile : bibEntry.getFiles()) {
	LinkedFileViewModel linkedFileViewModel = new LinkedFileViewModel(linkedFile, bibEntry, databaseContext, taskExecutor, dialogService, preferences.getXMPPreferences(), preferences.getFilePreferences(), ExternalFileTypes.getInstance());
	linkedFileViewModel.download();
	}
	}
	}

	NamedCompound namedCompound = new NamedCompound(Localization.lang("Import file"));
	namedCompound.addEdit(new UndoableInsertEntries(databaseContext.getDatabase(), entriesToImport));

	// merge strings into target database
	for (BibtexString bibtexString : parserResult.getDatabase().getStringValues()) {
	String bibtexStringName = bibtexString.getName();
	if (databaseContext.getDatabase().hasStringByName(bibtexStringName)) {
	String importedContent = bibtexString.getContent();
	String existingContent = databaseContext.getDatabase().getStringByName(bibtexStringName).get().getContent();
	if (!importedContent.equals(existingContent)) {
	LOGGER.warn("String contents differ for {}: {} != {}", bibtexStringName, importedContent, existingContent);
	// TODO: decide what to do here (in case the same string exits)
	}
	} else {
	databaseContext.getDatabase().addString(bibtexString);
	// FIXME: this prevents this method to be moved to logic - we need to implement a new undo/redo data model
	namedCompound.addEdit(new UndoableInsertString(databaseContext.getDatabase(), bibtexString));
	}
	}

	// copy content selectors to target database
	MetaData targetMetada = databaseContext.getMetaData();
	parserResult.getMetaData()
	.getContentSelectorList()
	.forEach(contentSelector -> targetMetada.addContentSelector(contentSelector));
	// TODO undo of content selectors (currently not implemented)

	// copy groups to target database
	parserResult.getMetaData().getGroups().ifPresent(
	newGroupsTreeNode -> {
	if (targetMetada.getGroups().isPresent()) {
	GroupTreeNode groupTreeNode = targetMetada.getGroups().get();
	newGroupsTreeNode.moveTo(groupTreeNode);
	namedCompound.addEdit(
	new UndoableAddOrRemoveGroup(
	new GroupTreeNodeViewModel(groupTreeNode),
	new GroupTreeNodeViewModel(newGroupsTreeNode),
	UndoableAddOrRemoveGroup.ADD_NODE));
	} else {
	// target does not contain any groups, so we can just use the new groups
	targetMetada.setGroups(newGroupsTreeNode);
	namedCompound.addEdit(
	new UndoableAddOrRemoveGroup(
	new GroupTreeNodeViewModel(newGroupsTreeNode),
	new GroupTreeNodeViewModel(newGroupsTreeNode),
	UndoableAddOrRemoveGroup.ADD_NODE));
	}
	}
	);

	namedCompound.end();
	Globals.undoManager.addEdit(namedCompound);
	JabRefGUI.getMainFrame().getCurrentBasePanel().markBaseChanged();
	}

	private ParserResult mergeImportResults(List<ImportFormatReader.UnknownFormatImport> imports) {
	BibDatabase resultDatabase = new BibDatabase();
	ParserResult result = new ParserResult(resultDatabase);

	for (ImportFormatReader.UnknownFormatImport importResult : imports) {
	if (importResult == null) {
	continue;
	}
	ParserResult parserResult = importResult.parserResult;
	List<BibEntry> entries = parserResult.getDatabase().getEntries();
	resultDatabase.insertEntries(entries);

	if (ImportFormatReader.BIBTEX_FORMAT.equals(importResult.format)) {
	// additional treatment of BibTeX
	// merge into existing database

	// Merge strings
	for (BibtexString bibtexString : parserResult.getDatabase().getStringValues()) {
	String bibtexStringName = bibtexString.getName();
	if (resultDatabase.hasStringByName(bibtexStringName)) {
	String importedContent = bibtexString.getContent();
	String existingContent = resultDatabase.getStringByName(bibtexStringName).get().getContent();
	if (!importedContent.equals(existingContent)) {
	LOGGER.warn("String contents differ for {}: {} != {}", bibtexStringName, importedContent, existingContent);
	// TODO: decide what to do here (in case the same string exits)
	}
	} else {
	resultDatabase.addString(bibtexString);
	}
	}

	// Merge groups
	// Adds the specified node as a child of the current root. The group contained in <b>newGroups </b> must not be of
	// type AllEntriesGroup, since every tree has exactly one AllEntriesGroup (its root). The <b>newGroups </b> are
	// inserted directly, i.e. they are not deepCopy()'d.
	parserResult.getMetaData().getGroups().ifPresent(newGroups -> {
	// ensure that there is always only one AllEntriesGroup in the resulting database
	// "Rename" the AllEntriesGroup of the imported database to "Imported"
	if (newGroups.getGroup() instanceof AllEntriesGroup) {
	// create a dummy group
	try {
	// This will cause a bug if the group already exists
	// There will be group where the two groups are merged
	String newGroupName = importResult.parserResult.getFile().map(File::getName).orElse("unknown");
	ExplicitGroup group = new ExplicitGroup("Imported " + newGroupName, GroupHierarchyType.INDEPENDENT,
	Globals.prefs.getKeywordDelimiter());
	newGroups.setGroup(group);
	group.add(parserResult.getDatabase().getEntries());
	} catch (IllegalArgumentException e) {
	LOGGER.error("Problem appending entries to group", e);
	}
	}
	result.getMetaData().getGroups().ifPresent(newGroups::moveTo);
	});

	for (ContentSelector selector : parserResult.getMetaData().getContentSelectorList()) {
	result.getMetaData().addContentSelector(selector);
	}
	}
	// TODO: collect errors into ParserResult, because they are currently ignored (see caller of this method)
	}

Enable Merging of BibDatabases #6689

Enable Merging of BibDatabases #6689

Conversation

DominikVoigt commented Jul 15, 2020

tobiasdiez left a comment

Choose a reason for hiding this comment

koppor commented Aug 1, 2020

koppor commented Aug 5, 2020

koppor left a comment

Choose a reason for hiding this comment

tobiasdiez left a comment

Choose a reason for hiding this comment

tobiasdiez Aug 28, 2020

Choose a reason for hiding this comment

DominikVoigt Aug 29, 2020

Choose a reason for hiding this comment

tobiasdiez Aug 28, 2020

Choose a reason for hiding this comment

DominikVoigt Aug 29, 2020

Choose a reason for hiding this comment

tobiasdiez Aug 28, 2020

Choose a reason for hiding this comment

tobiasdiez Aug 28, 2020

Choose a reason for hiding this comment

DominikVoigt commented Aug 29, 2020