Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FileMonitor for LaTeX citations #10585

Closed
koppor opened this issue Oct 25, 2023 · 24 comments · Fixed by #11245
Closed

Add FileMonitor for LaTeX citations #10585

koppor opened this issue Oct 25, 2023 · 24 comments · Fixed by #11245
Assignees

Comments

@koppor
Copy link
Member

koppor commented Oct 25, 2023

The LaTeX citations are NOT backed by a file update monitor.

a

Rewrite the code to use our org.jabref.gui.util.DefaultFileUpdateMonitor (or a new file update monitor) with proper shutdown at the end.

(Note: This is the technically sound fix for #10584)


Update

A first PR was submitted at #10937. It turned out, that a new FileMonitor is needed. It should watch a directory.

  • New .tex file: Parse the tex file, remove all non-existant citaton keys and add the new ones. Add the tex file to file listeners
  • Deleted .tex file: Remove the .tex file from the file listeners

The file listeners:

  • Changed.tex file: Parse the tex file, remove all non-existant citaton keys and add the new ones

  • Class org.jabref.logic.texparser.DefaultLatexParser needs to be modified to know the found latex citation keys per latex file. In case an "udpate" of the informaiton is triggered (by the file update monitor!!), all existing keys should be dropped and readded. Thus, org.jabref.model.texparser.LatexParserResult also needs to adapted (see below)
  • Refactor org.jabref.gui.entryeditor.LatexCitationsTabViewModel: Move away org.jabref.gui.entryeditor.LatexCitationsTabViewModel#searchDirectory, org.jabref.gui.entryeditor.LatexCitationsTabViewModel#searchAndParse as class CitationFinder in package org.jabref.logic.texparser. -- Also test cases need to be added
  • Refactor org.jabref.model.texparser.LatexParserResult to allow partial updates. That means: If file X is changed, call new DefaultLatexParser().parse(X) and change the contents of org.jabref.model.texparser.LatexParserResult#citations accordingly. IMHO, a map from Path to Set<Citations> (Multimap) should be kept. The value before the update and after the update be stored. A diff calculated. That diff should be applied to citations. - The method update can be implemented in LatexParserResult. - The class LatexParserResult should be renamed in CitationState.
  • A new file watcher, which works on directories and handles both file editing and creation of new files. On both events, CitationState#update(path) can be called. That method should simply add the file to the currently known files. -- Deletion has to be handled separately. event: delete, new method CitationState#remove(path) called

Reasons: Energy saving. Very little CPU usage for an update on one file. Although your current method works, it parses ALL .tex files again, and handles the content. Large LaTeX projects have dozens of tex files, which are very long. This costs time.

@koppor koppor added the good first issue An issue intended for project-newcomers. Varies in difficulty. label Oct 25, 2023
@Jaovitosr
Copy link
Contributor

Hi! Hope you're all doing well!
I'm really interested in tackling this issue as a beginner. It looks like a great way to learn and dive into the project.
Looking forward to contributing! Thanks!

@ThiloteE
Copy link
Member

ThiloteE commented Nov 7, 2023

Sure, go ahead :-)

@Jaovitosr
Copy link
Contributor

Thank you very much! :)

@Jaovitosr
Copy link
Contributor

Hello, how are you? I hope all is well!

I'd like to sincerely apologize for the delay in addressing the issue. I encountered some unexpected setbacks, but I'm back and aim to resolve it as soon as possible.
Just a few questions... Should I follow the solution approach outlined in issue #10584 and implement a refresh button? I've located the implementation of the File monitor in the project and have a grasp of how to use it, but I would like clarification on the necessity of incorporating this button.

Thank you in advance for your assistance and understanding! :)

@ThiloteE
Copy link
Member

If you manage to do it without causing extensive performance degradation for large libraries, then not having a refresh button would be fair enough :-)

@Siedlerchr Siedlerchr moved this from Reserved to Free to take in Candidates for University Projects Jan 23, 2024
@roxannecvl
Copy link

Hey, my university software engineering group are willing to tackle this issue. Would it be possible to assign us ? Here are our git username @rachedkko @VinterSallad @Emiesce @MercuriaL01 and @roxannecvl.
Thank you for your answer !!

@ThiloteE
Copy link
Member

Sure, I will assign you.

@ThiloteE ThiloteE moved this from Free to take to Reserved in Candidates for University Projects Feb 22, 2024
@rachedkko
Copy link

Hi, i am also working on this issue !

@VinterSallad
Copy link

Hi, I am in the same group and university course!

@MercuriaL01
Copy link

Hello! I am working on it too together with @rachedkko @VinterSallad @Emiesce and @roxannecvl

@ThiloteE
Copy link
Member

For organisational purposes it is totally fine, if only one person is assigned, if we know you are a group and work together. It is only possible to assign somebody who has commented in the issue, but if everybody comments, it will be a lot of spam, if we have like 160 developers.

@ThiloteE ThiloteE added the FirstTimeCodeContribution Triggers GitHub Greeter Workflow label Feb 25, 2024
Copy link
Contributor

As a general advice for newcomers: check out Contributing for a start. Also, guidelines for setting up a local workspace is worth having a look at.

Feel free to ask here at GitHub, if you have any issue related questions. If you have questions about how to setup your workspace use JabRef's Gitter chat. Try to open a (draft) pull-request early on, so that people can see you are working on the issue and so that they can see the direction the pull request is heading towards. This way, you will likely receive valuable feedback.

VinterSallad referenced this issue in rachedkko/jabref Feb 27, 2024
…to LatexCitation (JabRef#10585)

Known Issues:

Sometimes throws ConcurrentModificationException from DefaultFileUpdateMonitor whenever a change occurs in the tracked directory

Does not have any kind of shutdown

Is not able to track changes in nested directories

Does not handle whenever a directory is changed from the initial one properly

Makes use of a Deprecated Class
VinterSallad referenced this issue in rachedkko/jabref Feb 28, 2024
…Ref#10585)

Changed the MultiMap into a synchronized type for safety

Used the Platform.runLater() to ensure processes are executed on correct threads

Not sure what the side effects of these fixes are
VinterSallad referenced this issue in rachedkko/jabref Feb 28, 2024
Made sure that filemonitor first is added when the user navigates to LaTeX Citation for the first time for each .bib tab, also added logic to remove the old listener whenever the directory target is changed

However, an issue is that listener seemingly is still watching over the old directory even though the listener has been removed
@VinterSallad
Copy link

VinterSallad commented Feb 28, 2024

Hi, I wanted to ask about the requirement of having a proper shutdown to the listeners at the end.

When are the listeners supposed to be shut down exactly?

If it is on program shutdown, then I think it is all handled since the fileMonitor I get is from the Globals class. Or is it whenever the user navigates out of the Latex Citations tab?

rachedkko added a commit to rachedkko/jabref that referenced this issue Mar 3, 2024
rachedkko added a commit to rachedkko/jabref that referenced this issue Mar 3, 2024
rachedkko added a commit to rachedkko/jabref that referenced this issue Mar 3, 2024
rachedkko added a commit to rachedkko/jabref that referenced this issue Mar 3, 2024
rachedkko added a commit to rachedkko/jabref that referenced this issue Mar 3, 2024
rachedkko added a commit to rachedkko/jabref that referenced this issue Mar 3, 2024
rachedkko added a commit to rachedkko/jabref that referenced this issue Mar 3, 2024
rachedkko added a commit to rachedkko/jabref that referenced this issue Mar 3, 2024
rachedkko added a commit to rachedkko/jabref that referenced this issue Mar 3, 2024
@koppor koppor removed good first issue An issue intended for project-newcomers. Varies in difficulty. FirstTimeCodeContribution Triggers GitHub Greeter Workflow labels Mar 4, 2024
@koppor koppor moved this from Reserved to Free to take in Candidates for University Projects Mar 4, 2024
@LoayGhreeb
Copy link
Collaborator

I would like to work on this issue.

@LoayGhreeb
Copy link
Collaborator

@koppor, while working on this issue, implementing a new directory monitor there are a few options to use:

java.nio.file.WatchService

Tested on Windows 11 using WatchDir.java example from the Oracle tutorials Watching a Directory for Changes.

There are multiple issues:

  1. It does not detect files coming together with a new folder. (JDK issue: JDK-8162948)
  2. Deleting a sub-directory using Del (Move to trash) does not detect deleted files in the sub-directory. However, it detects deleted files when using Shift+Del (Delete parentally).
  3. Access denied when trying to delete the recursively watched directory on Windows. (JDK issue: JDK-6972833)
  4. It is implemented on MacOS by the generic PollingWatchService, which continuously re-scans the directory. (JDK issue: JDK-8293067)

io.methvin.watcher.DirectoryWatcher

Tested on Windows 11 using mentioned example in the repo.

It resolves issues (1, 3, 4) that occur in java.nio.file.WatchService because it uses ExtendedWatchEventModifier.FILE_TREE on Windows, and native implementation based on the Carbon File System Events API for MacOS.
However, issue (2) can still be reproduced.


org.apache.commons.io.monitor

There are no issues with it, but it uses a polling mechanism at fixed intervals to check for any new events. This can waste CPU cycles, especially if no change occurs. However, it can handle huge files without overflowing.


What do you suggest I should do? I haven't tested on any other device or OS to see if these issues are reproducible or not.

@koppor
Copy link
Member Author

koppor commented Apr 18, 2024

@LoayGhreeb Nice evaluation! I need to think in ADRs, especially MADR. What are our decision drivers?

  • Working for users in most cases (surely this is always a decision driver, not fulfilled by the JDK native solution)
  • Working well for users (in addition to the first one, this adds an additional criterion about the user-perceived quality)
  • Future-proofness of the dependency: The dependency should also work in 3 to 10 years without requiring us to maintain

Seeing your JDK links, I checked the source of both the Apache Commons IO and methvin's watchers. I saw at https://github.com/gmethvin/directory-watcher/blob/3218d68a845ebd803ebd98af3be4692d1b63e12c/core/src/main/java/io/methvin/watcher/DirectoryWatcher.java#L162 that in non-macOS. Re-iterating on your comment again (ExtendedWatchEventModifier.FILE_TREE), I checked the code more. I do not like that they hash the file completely (https://github.com/gmethvin/directory-watcher/blob/3218d68a845ebd803ebd98af3be4692d1b63e12c/core/src/main/java/io/methvin/watcher/DirectoryWatcher.java#L468). They could have hashed the first 2kb and then do a full hash.

I also checked FileAlterationObserver.java. I liked FileFilterUtils.suffixFileFilter(".java"));. In our case, we are interested in .tex only.

Apache Commons does a check on modification time and length (refresh). This is IMHO the better approach

Thus, Apache Commons is IMHO the better way to go.


BTW: Being able to press the dot (.) to open a VS Code in the Web for a GitHub repo is very nice for code browsing. In case you didn't know.

@Siedlerchr
Copy link
Member

Our bib file watcher is based on the java watch service

@LoayGhreeb
Copy link
Collaborator

Our bib file watcher is based on the java watch service

It works because it doesn't care about the sub-directories

@koppor
Copy link
Member Author

koppor commented Apr 18, 2024

Our bib file watcher is based on the java watch service

It works because it doesn't care about the sub-directories

We could register the .tex files directly. Watch the latex directory. And then check for \input and \include. However, this has more efforts than using Apache Commons FileMonitor and filtering for .tex files.

@LoayGhreeb
Copy link
Collaborator

We could register the .tex files directly. Watch the latex directory. And then check for \input and \include. However, this has more efforts than using Apache Commons FileMonitor and filtering for .tex files.

Does this handle the creation or deletion of .tex files in sub-directories of the latex directory?

By the way, can the \include command include a .tex file that is in another directory? If yes, should I also watch that directory to monitor changes to that file (Maybe using the DefaultFileUpdateMonitor)?

@koppor
Copy link
Member Author

koppor commented Apr 18, 2024

We could register the .tex files directly. Watch the latex directory. And then check for \input and \include. However, this has more efforts than using Apache Commons FileMonitor and filtering for .tex files.
Does this handle the creation or deletion of .tex files in sub-directories of the latex directory?

What I meant is that Apache Commons FileMonitor and filtering for .tex files is great (because the manual way has more efforts). Just go ahead with that. Seems to be more clean code!

By the way, can the \include command include a .tex file that is in another directory?

Yes. One could even have macro expansion on the file name. Rarely used though.

If yes, should I also watch that directory to monitor changes to that file (Maybe using the DefaultFileUpdateMonitor)?

When using Apache Commons FileMonitor , that quesion is obsolete, isn't it?

@LoayGhreeb
Copy link
Collaborator

What I meant is that Apache Commons FileMonitor and filtering for .tex files is great (because the manual way has more efforts). Just go ahead with that. Seems to be more clean code!

Great! Will implement it and submit a pull request soon.

When using Apache Commons FileMonitor , that quesion is obsolete, isn't it?

Still need to know how to handle the case where a .tex file in the watched directory includes another .tex file outside the watched directory.
How to monitor changes for that file? Because in that case, only need to monitor that file, not its entire directory.

Maybe it's better not to think about those cases for now. Implementing the basics first will make it easier to determine what should be done later.

@koppor
Copy link
Member Author

koppor commented Apr 19, 2024

When using Apache Commons FileMonitor , that quesion is obsolete, isn't it?
Still need to know how to handle the case where a .tex file in the watched directory includes another .tex file outside the watched directory.

  • If you ask for sub directories -> I assumed Apache Commons IO Monitor can watch directories, sub directories and filter out files. Thus, "just using" Apache Commons IO Monitor should be enough
  • If you ask for parent, sibling or other directories outside of the current directory and sub directories, I see that as accepted limitation.

How to monitor changes for that file? Because in that case, only need to monitor that file, not its entire directory.

PhD thesis are typically split among several files. Thus, watching a directory and all sub directories for .tex files should cover almost all cases.

If it is too difficult to watch a directory for .tex file creation, then we can rely on the "Refresh" button though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
9 participants