Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for issue 6197: Do not store downloaded file if it is already attached #7702

Merged
merged 18 commits into from
May 17, 2021

Conversation

Pikayue11
Copy link
Contributor

@Pikayue11 Pikayue11 commented May 4, 2021

Fix #6197

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.

Reproduce this issue:

1 Choose an entry, click search full text document online in Lookup

2 If it is the first time you download it, it works well. But if it is not, it will download a duplicate file with the same content, the .bib file will has two duplicate files as below

file    = {:banerjee2020a - A Survey on Influence Maximization in a Social Network (2).pdf:PDF;:banerjee2020a - A Survey on Influence Maximization in a Social Network (1).pdf:PDF;:banerjee2020a - A Survey on Influence Maximization in a Social Network.pdf:PDF},

In this issue, we want to keep the new file with different contents but delete the same one. Thus, after download a new file, we need to first judge whether the file name is the duplicate form like "xxx (1)" compare to the other files in the target directory. If yes, then compare the content of these files sharing the "similar" name, if the new file has the same content as one of them, then delete it and do not add it to the linked file.

Process to fix the issue:

1 The related code is in LinkedFileViewModel.java, inside the function download(), because we suppose that the file has already been download, the code should be inside downloadTask.onSuccess

2 We define a boolean function isDuplicatedFile in FileNameUniqueness.java to judge whether the file is a duplicated one in content, if yes, the newly downloaded file will be deleted and it will not be added to the linked file.

3 In FileNameUniqueness.java we also define function named eraseDuplicateMarks to change the file name like "xxx (1)" to "xxx", and return the same string if it is not in the form of duplicate file name

4 In the boolean function isDuplicatedFile, it will first judge whether there are other files have "similar" name with its. If do not have, return false, indicating not duplicate. If have, compare (use com.google.common.io.Files.equal) the content of the newly downloaded file with all the files has "similar" file name, if the content is different from all of them, return false. If one of them has the same content with the new file's content, delete the new file, display warning message and return true to indicate duplication happens.

Screenshot of the result

Use the following .bib file for test

@article{banerjee2020a,
	title="A survey on influence maximization in a social network",
	author="Suman {Banerjee} and Mamata {Jenamani} and Dilip Kumar {Pratihar}",
	journal="Knowledge and Information Systems",
	volume="62",
	number="9",
	pages="3417--3455",
	notes="Sourced from Microsoft Academic - https://academic.microsoft.com/paper/3013137889",
	year="2020"
}

Before

After search full text document online for three times

test4_files

file    = {:banerjee2020a - A Survey on Influence Maximization in a Social Network (2).pdf:PDF;:banerjee2020a - A Survey on Influence Maximization in a Social Network (1).pdf:PDF;:banerjee2020a - A Survey on Influence Maximization in a Social Network.pdf:PDF},

After

Test1

After search full text document online for three times

test5_files

delete_message

file    = {:banerjee2020a - A Survey on Influence Maximization in a Social Network.pdf:PDF},

Test2

After test1, in order to test whether it will compare the content of the file, we replace the downloaded pdf file "banerjee2020a - A Survey on Influence Maximization in a Social Network.pdf" with a new pdf file but keep a same name in the same folder, and then search full text document online

The new file:

replace_file

The outcome:

replace_message

replace_download

file    = {:banerjee2020a - A Survey on Influence Maximization in a Social Network (1).pdf:PDF;:banerjee2020a - A Survey on Influence Maximization in a Social Network.pdf:PDF},

Copy link
Member

@Siedlerchr Siedlerchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work, overall I think it looks already good. Need to take a detailed look tomorrow again.
some minor recommendations so far

@Siedlerchr Siedlerchr added the status: changes required Pull requests that are not yet complete label May 4, 2021
@Pikayue11
Copy link
Contributor Author

Thanks for your feedback! I am a little busy today but I will improve my code as soon as possible.

@Pikayue11
Copy link
Contributor Author

I modify my code and manually test it in running JabRef again, it works. But I delete the test cases because the method "eraseDuplicateMarks" in "FileNameUniqueness.java" is now private

@Pikayue11
Copy link
Contributor Author

I think my latest commit has solved all the problems in the recommendations. I am not very familiar with the pull request operate in github, and I just clicked all the "resolve conversation" buttons, is that matter? Besides, is there any other recommendations for my current pr? Any feedback will be appreiated. @Siedlerchr

dialogService.notify(Localization.lang("Downloaded website as an HTML file."));
LOGGER.debug("Downloaded website {} as an HTML file at {}", linkedFile.getLink(), destination);
} catch (IOException e) {
e.printStackTrace();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use LOGGER.error(...)

@Siedlerchr Siedlerchr added status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers and removed status: changes required Pull requests that are not yet complete labels May 6, 2021
@Siedlerchr Siedlerchr requested review from calixtus and koppor May 6, 2021 17:27
Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this.

I went through the whole code. The general way of the implementation is OK (I thought about changing FileDownloader, but this is too early -> the file really needs to be downloaded).

I have some detailed comments. Please add test cases eraseDuplicateMarks and isDuplicatedFile need to be covered by test cases

return false;
}

Path originalFile = Path.of(directory.toString(), originalFileName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the Path API to construct file names contained in a Path

Suggested change
Path originalFile = Path.of(directory.toString(), originalFileName);
Path originalFile = directory.resolve(originalFileName);

Comment on lines 60 to 70
Optional<String> extensionOptional = FileUtil.getFileExtension(fileName);
String extensionSuffix;
String fileNameWithoutDuplicated;

if (extensionOptional.isPresent()) {
extensionSuffix = '.' + extensionOptional.get();
fileNameWithoutDuplicated = eraseDuplicateMarks(fileName.substring(0, fileName.lastIndexOf('.')));
} else {
extensionSuffix = "";
fileNameWithoutDuplicated = eraseDuplicateMarks(fileName);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the available methods in FileUtil and the power of Java optionals

Suggested change
Optional<String> extensionOptional = FileUtil.getFileExtension(fileName);
String extensionSuffix;
String fileNameWithoutDuplicated;
if (extensionOptional.isPresent()) {
extensionSuffix = '.' + extensionOptional.get();
fileNameWithoutDuplicated = eraseDuplicateMarks(fileName.substring(0, fileName.lastIndexOf('.')));
} else {
extensionSuffix = "";
fileNameWithoutDuplicated = eraseDuplicateMarks(fileName);
}
String fileNameWithoutDuplicated = eraseDuplicateMarks(FileUtil.getBaseName(fileName));
String extensionSuffix = FileUtil.getFileExtension(fileName).getOrElse("");

while (true) {
if (com.google.common.io.Files.equal(originalFile.toFile(), duplicateFile.toFile())) {
if (duplicateFile.toFile().delete()) {
dialogService.notify(Localization.lang("Dupilcate file with '%0', successfully delete the file '%1'", originalFileName, fileName));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dialogService.notify(Localization.lang("Dupilcate file with '%0', successfully delete the file '%1'", originalFileName, fileName));
dialogService.notify(Localization.lang("File '%1' is a duplicate of '%0'. Keeping '%0'", originalFileName, fileName));

Comment on lines 120 to 132
int dotPosition1 = fileName.lastIndexOf(')');
if (dotPosition1 != fileName.length() - 1 || dotPosition1 <= 2) {
return fileName;
} else {
if (!Character.isDigit(fileName.charAt(dotPosition1 - 1))) {
return fileName;
}
}
int dotPosition = fileName.lastIndexOf('(');
if ((dotPosition > 0) && (dotPosition < (fileName.length() - 1))) {
fileName = fileName.substring(0, dotPosition - 1);
}
return fileName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace the whole method body and use regular expressions.

Hint: (.*) \(\d+\) and https://stackoverflow.com/q/988655/873282

counter++;

if (originalFileName.equals(fileName)) {
dialogService.notify(Localization.lang("Duplicate file name but different content, keep the new file '%0'", fileName));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this line, because the file name is not duplicate (but only simlar - hard to explain to the user)


Path duplicateFile = Path.of(directory.toString(), fileName);
int counter = 1;
while (true) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while true is not a good style. This is 1990s code style with gotos. See https://homepages.cwi.nl/~storm/teaching/reader/Dijkstra68.pdf for details.

Suggested change
while (true) {
while (Files.exists(directory.resolve(originalFileName))) {

Comment on lines 106 to 108
if (!Files.exists(directory.resolve(originalFileName))) {
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed, because we fixed the loop condition

Suggested change
if (!Files.exists(directory.resolve(originalFileName))) {
return false;
}

@koppor koppor added status: changes required Pull requests that are not yet complete and removed status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers labels May 6, 2021
@Pikayue11
Copy link
Contributor Author

Thanks for the reply! I will fix my codes as soon as possible.

@Pikayue11 Pikayue11 requested review from koppor and Siedlerchr May 7, 2021 15:38
Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the quick reaction. Some small comments remain 😇

linkedFiles.set(oldFileIndex, newLinkedFile);
boolean isDuplicate = false;
try {
isDuplicate = FileNameUniqueness.isDuplicatedFile(targetDirectory.get(), destination.getFileName().toString(), dialogService);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use String only as last resort. There should always "real" domain objects be used.

Suggested change
isDuplicate = FileNameUniqueness.isDuplicatedFile(targetDirectory.get(), destination.getFileName().toString(), dialogService);
isDuplicate = FileNameUniqueness.isDuplicatedFile(targetDirectory.get(), destination.getFileName(), dialogService);

try {
isDuplicate = FileNameUniqueness.isDuplicatedFile(targetDirectory.get(), destination.getFileName().toString(), dialogService);
} catch (IOException e) {
LOGGER.error(e.toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never use toString() to log an exception

Suggested change
LOGGER.error(e.toString());
LOGGER.error("FileNameUniquess.isDuplicatedFile failed", e);

* @param directory The directory which saves the files (.pdf, for example)
* @param fileName Suggest name for the newly downloaded file
* @param dialogService To display the error and success message
* @return true when the content of files is duplicate and successfully delete the newly downloaded file ,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return true when the content of files is duplicate and successfully delete the newly downloaded file ,
* @return true when the content of files is duplicate and successfully delete the newly downloaded file,

* false when it is not a duplicate file or fail to delete the duplicate file
* @throws IOException Fail when the file is not exist or something wrong when reading the file
*/
public static boolean isDuplicatedFile(Path directory, String fileName, DialogService dialogService) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static boolean isDuplicatedFile(Path directory, String fileName, DialogService dialogService) throws IOException {
public static boolean isDuplicatedFile(Path directory, Path fileName, DialogService dialogService) throws IOException {

*/
public static boolean isDuplicatedFile(Path directory, String fileName, DialogService dialogService) throws IOException {

String fileNameWithoutDuplicated = eraseDuplicateMarks(FileUtil.getBaseName(fileName));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce FileUtil.getBaseName(Path) similar to getFileExtension(Path file) {.

@Pikayue11
Copy link
Contributor Author

I add an overload function getBaseName in FileUtil, imitating the code style of the function getFileExtension in FileUtil, to pass the Path variable as parameter

@Pikayue11 Pikayue11 requested a review from koppor May 8, 2021 06:47
@Siedlerchr
Copy link
Member

Please have a look at the failing unit test. You need to add the missing language keys https://devdocs.jabref.org/getting-into-the-code/code-howtos#using-localization-correctly

@Pikayue11
Copy link
Contributor Author

I just add the missing language keys into JabRef_en.properties, is there anything else to modify?

* @return Suggested name for the file without extensionSuffix and duplicate marks such as " (1)"
*/
public static String eraseDuplicateMarks(String fileName) {
Pattern p = Pattern.compile("(.*) \\(\\d+\\)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this Pattern.compile to a static field in the class e.g. private static final Pattern DUPLICATE_MARK_PATTERN = Pattern.compile . The rest is good so far

Copy link
Member

@Siedlerchr Siedlerchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my side looks good so far 👍
Just one minor improvement.

@Pikayue11
Copy link
Contributor Author

Ok, I fixed it

@Pikayue11
Copy link
Contributor Author

@koppor Any other comments to improve my pr? Any feedback is appreciated.

@Pikayue11 Pikayue11 requested a review from Siedlerchr May 13, 2021 12:15
@Siedlerchr Siedlerchr added status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers and removed status: changes required Pull requests that are not yet complete labels May 13, 2021
@koppor
Copy link
Member

koppor commented May 17, 2021

Thank you for keep working on this. Puts a smile on my face and a nice present for me at my free time hobby project JabRef.

@koppor koppor merged commit 5a95f10 into JabRef:main May 17, 2021
@koppor koppor removed the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not store downloaded file if it is already attached
3 participants