-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for issue 6197: Do not store downloaded file if it is already attached #7702
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work, overall I think it looks already good. Need to take a detailed look tomorrow again.
some minor recommendations so far
Thanks for your feedback! I am a little busy today but I will improve my code as soon as possible. |
I modify my code and manually test it in running JabRef again, it works. But I delete the test cases because the method "eraseDuplicateMarks" in "FileNameUniqueness.java" is now private |
I think my latest commit has solved all the problems in the recommendations. I am not very familiar with the pull request operate in github, and I just clicked all the "resolve conversation" buttons, is that matter? Besides, is there any other recommendations for my current pr? Any feedback will be appreiated. @Siedlerchr |
dialogService.notify(Localization.lang("Downloaded website as an HTML file.")); | ||
LOGGER.debug("Downloaded website {} as an HTML file at {}", linkedFile.getLink(), destination); | ||
} catch (IOException e) { | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use LOGGER.error(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this.
I went through the whole code. The general way of the implementation is OK (I thought about changing FileDownloader
, but this is too early -> the file really needs to be downloaded).
I have some detailed comments. Please add test cases eraseDuplicateMarks
and isDuplicatedFile
need to be covered by test cases
return false; | ||
} | ||
|
||
Path originalFile = Path.of(directory.toString(), originalFileName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the Path API to construct file names contained in a Path
Path originalFile = Path.of(directory.toString(), originalFileName); | |
Path originalFile = directory.resolve(originalFileName); |
Optional<String> extensionOptional = FileUtil.getFileExtension(fileName); | ||
String extensionSuffix; | ||
String fileNameWithoutDuplicated; | ||
|
||
if (extensionOptional.isPresent()) { | ||
extensionSuffix = '.' + extensionOptional.get(); | ||
fileNameWithoutDuplicated = eraseDuplicateMarks(fileName.substring(0, fileName.lastIndexOf('.'))); | ||
} else { | ||
extensionSuffix = ""; | ||
fileNameWithoutDuplicated = eraseDuplicateMarks(fileName); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the available methods in FileUtil
and the power of Java optionals
Optional<String> extensionOptional = FileUtil.getFileExtension(fileName); | |
String extensionSuffix; | |
String fileNameWithoutDuplicated; | |
if (extensionOptional.isPresent()) { | |
extensionSuffix = '.' + extensionOptional.get(); | |
fileNameWithoutDuplicated = eraseDuplicateMarks(fileName.substring(0, fileName.lastIndexOf('.'))); | |
} else { | |
extensionSuffix = ""; | |
fileNameWithoutDuplicated = eraseDuplicateMarks(fileName); | |
} | |
String fileNameWithoutDuplicated = eraseDuplicateMarks(FileUtil.getBaseName(fileName)); | |
String extensionSuffix = FileUtil.getFileExtension(fileName).getOrElse(""); |
while (true) { | ||
if (com.google.common.io.Files.equal(originalFile.toFile(), duplicateFile.toFile())) { | ||
if (duplicateFile.toFile().delete()) { | ||
dialogService.notify(Localization.lang("Dupilcate file with '%0', successfully delete the file '%1'", originalFileName, fileName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dialogService.notify(Localization.lang("Dupilcate file with '%0', successfully delete the file '%1'", originalFileName, fileName)); | |
dialogService.notify(Localization.lang("File '%1' is a duplicate of '%0'. Keeping '%0'", originalFileName, fileName)); |
int dotPosition1 = fileName.lastIndexOf(')'); | ||
if (dotPosition1 != fileName.length() - 1 || dotPosition1 <= 2) { | ||
return fileName; | ||
} else { | ||
if (!Character.isDigit(fileName.charAt(dotPosition1 - 1))) { | ||
return fileName; | ||
} | ||
} | ||
int dotPosition = fileName.lastIndexOf('('); | ||
if ((dotPosition > 0) && (dotPosition < (fileName.length() - 1))) { | ||
fileName = fileName.substring(0, dotPosition - 1); | ||
} | ||
return fileName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace the whole method body and use regular expressions.
Hint: (.*) \(\d+\)
and https://stackoverflow.com/q/988655/873282
counter++; | ||
|
||
if (originalFileName.equals(fileName)) { | ||
dialogService.notify(Localization.lang("Duplicate file name but different content, keep the new file '%0'", fileName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this line, because the file name is not duplicate (but only simlar - hard to explain to the user)
|
||
Path duplicateFile = Path.of(directory.toString(), fileName); | ||
int counter = 1; | ||
while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while true
is not a good style. This is 1990s code style with gotos. See https://homepages.cwi.nl/~storm/teaching/reader/Dijkstra68.pdf for details.
while (true) { | |
while (Files.exists(directory.resolve(originalFileName))) { |
if (!Files.exists(directory.resolve(originalFileName))) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed, because we fixed the loop condition
if (!Files.exists(directory.resolve(originalFileName))) { | |
return false; | |
} |
Thanks for the reply! I will fix my codes as soon as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the quick reaction. Some small comments remain 😇
linkedFiles.set(oldFileIndex, newLinkedFile); | ||
boolean isDuplicate = false; | ||
try { | ||
isDuplicate = FileNameUniqueness.isDuplicatedFile(targetDirectory.get(), destination.getFileName().toString(), dialogService); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use String
only as last resort. There should always "real" domain objects be used.
isDuplicate = FileNameUniqueness.isDuplicatedFile(targetDirectory.get(), destination.getFileName().toString(), dialogService); | |
isDuplicate = FileNameUniqueness.isDuplicatedFile(targetDirectory.get(), destination.getFileName(), dialogService); |
try { | ||
isDuplicate = FileNameUniqueness.isDuplicatedFile(targetDirectory.get(), destination.getFileName().toString(), dialogService); | ||
} catch (IOException e) { | ||
LOGGER.error(e.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never use toString()
to log an exception
LOGGER.error(e.toString()); | |
LOGGER.error("FileNameUniquess.isDuplicatedFile failed", e); |
* @param directory The directory which saves the files (.pdf, for example) | ||
* @param fileName Suggest name for the newly downloaded file | ||
* @param dialogService To display the error and success message | ||
* @return true when the content of files is duplicate and successfully delete the newly downloaded file , |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @return true when the content of files is duplicate and successfully delete the newly downloaded file , | |
* @return true when the content of files is duplicate and successfully delete the newly downloaded file, |
* false when it is not a duplicate file or fail to delete the duplicate file | ||
* @throws IOException Fail when the file is not exist or something wrong when reading the file | ||
*/ | ||
public static boolean isDuplicatedFile(Path directory, String fileName, DialogService dialogService) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static boolean isDuplicatedFile(Path directory, String fileName, DialogService dialogService) throws IOException { | |
public static boolean isDuplicatedFile(Path directory, Path fileName, DialogService dialogService) throws IOException { |
*/ | ||
public static boolean isDuplicatedFile(Path directory, String fileName, DialogService dialogService) throws IOException { | ||
|
||
String fileNameWithoutDuplicated = eraseDuplicateMarks(FileUtil.getBaseName(fileName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduce FileUtil.getBaseName(Path) similar to getFileExtension(Path file) {
.
I add an overload function getBaseName in FileUtil, imitating the code style of the function getFileExtension in FileUtil, to pass the Path variable as parameter |
Please have a look at the failing unit test. You need to add the missing language keys https://devdocs.jabref.org/getting-into-the-code/code-howtos#using-localization-correctly |
I just add the missing language keys into |
* @return Suggested name for the file without extensionSuffix and duplicate marks such as " (1)" | ||
*/ | ||
public static String eraseDuplicateMarks(String fileName) { | ||
Pattern p = Pattern.compile("(.*) \\(\\d+\\)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this Pattern.compile to a static field in the class e.g. private static final Pattern DUPLICATE_MARK_PATTERN = Pattern.compile . The rest is good so far
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my side looks good so far 👍
Just one minor improvement.
Ok, I fixed it |
@koppor Any other comments to improve my pr? Any feedback is appreciated. |
Thank you for keep working on this. Puts a smile on my face and a nice present for me at my free time hobby project JabRef. |
Fix #6197
CHANGELOG.md
described in a way that is understandable for the average user (if applicable)Reproduce this issue:
1 Choose an entry, click search full text document online in Lookup
2 If it is the first time you download it, it works well. But if it is not, it will download a duplicate file with the same content, the .bib file will has two duplicate files as below
In this issue, we want to keep the new file with different contents but delete the same one. Thus, after download a new file, we need to first judge whether the file name is the duplicate form like "xxx (1)" compare to the other files in the target directory. If yes, then compare the content of these files sharing the "similar" name, if the new file has the same content as one of them, then delete it and do not add it to the linked file.
Process to fix the issue:
1 The related code is in LinkedFileViewModel.java, inside the function download(), because we suppose that the file has already been download, the code should be inside downloadTask.onSuccess
2 We define a boolean function isDuplicatedFile in FileNameUniqueness.java to judge whether the file is a duplicated one in content, if yes, the newly downloaded file will be deleted and it will not be added to the linked file.
3 In FileNameUniqueness.java we also define function named eraseDuplicateMarks to change the file name like "xxx (1)" to "xxx", and return the same string if it is not in the form of duplicate file name
4 In the boolean function isDuplicatedFile, it will first judge whether there are other files have "similar" name with its. If do not have, return false, indicating not duplicate. If have, compare (use com.google.common.io.Files.equal) the content of the newly downloaded file with all the files has "similar" file name, if the content is different from all of them, return false. If one of them has the same content with the new file's content, delete the new file, display warning message and return true to indicate duplication happens.
Screenshot of the result
Use the following .bib file for test
Before
After search full text document online for three times
After
Test1
After search full text document online for three times
Test2
After test1, in order to test whether it will compare the content of the file, we replace the downloaded pdf file "banerjee2020a - A Survey on Influence Maximization in a Social Network.pdf" with a new pdf file but keep a same name in the same folder, and then search full text document online
The new file:
The outcome: