Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix exception when saving and autosave trigger at the same time #6694

Closed
wants to merge 46 commits into from

Conversation

Siedlerchr
Copy link
Member

@Siedlerchr Siedlerchr commented Jul 17, 2020

Fixes #6684
Fixes #6644
Fixes #6102
Fixes #6000

Summary:
Manual save + autosave produce an exception/race condition on saving, because both want to access the same file for moving the temp file to the target file.

Solution: Throttle multiple saving operations in a kind of queue

  • Change in CHANGELOG.md described (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.

@Siedlerchr Siedlerchr marked this pull request as draft July 17, 2020 16:49
@tobiasdiez
Copy link
Member

I wouldn't remove auto-saving. Actually, I feel this should be the standard behavior: why do I need to remember as a user to save the file?

@Siedlerchr
Copy link
Member Author

I will try if I can find a way to lock the saving to one thread

@tobiasdiez
Copy link
Member

tobiasdiez commented Jul 18, 2020

That might lead to performance problems: saving might take a few seconds so in the worst case the save requests add up. The same with "Save all" etc.

I think the underlying problem is really that the tmp file is reused between save operations. This is really not a good idea (also for data integrity).

@Siedlerchr
Copy link
Member Author

I'm open for ideas: If you have autosave running and manually hit saving you run into problems. Even if you would write to another temp file first that doesn't help. You will get an exception nonetheless as the file is locked by another thread when it is moved.
The idea would be cancel any further attempts of saving of the same database when one is already running.

@tobiasdiez
Copy link
Member

You will get an exception nonetheless as the file is locked by another thread when it is moved.

Which file is locked?

My idea was:

  • Write to unique temp file
  • Copy temp file over target file

The first step should only lock the temp file, and write to it. There is no problem there since every write process has its own temp file, and there is no interaction. The second step is atomic, so there is no locking or inter-thread problem.

In addition to these technicalities, I think it be worthwhile to improve the user experience a bit. I guess it would be good to hard-cancel every running auto-save and normal save (and delete the temporary file), as soon as the user runs a new save for the same file. This reduces overhead.

@Siedlerchr
Copy link
Member Author

@tobiasdiez Even though you have atomic moving, the other thread will throw an IllegalAccessException.

Following situation:

  1. Thread Autosave A
  2. Thread Manual Save B

A writes to temp files and now calls the Atomic File Move => Performs Atomic Moves temp file A to target file
B writes to temp file and also wants to call the Atomic File Move (while the other thread is still moving) => Boom! You will get an IllegalAcessException for the target file because the other thread is still in the process of moving and probably has a lock on that file.

To reproduce:
Set a breakpoint in that close method of AtomicFileOutputstream
Type something in a field, wait for autosave to trigger
Manually hit save.

the only way one could solve this would be to cancel the running? save operation

@tobiasdiez
Copy link
Member

tobiasdiez commented Jul 18, 2020

But this exception is coming from this lock, isn't it? (Which comes from the fact that we are writing to the same tmp file)

temporaryFileLock = ((FileOutputStream) out).getChannel().lock();

Since Files.move(...) is atomic, I cannot set a breakpoint in it - so it's hard to test if an exception is thrown if two request for moving to the same target are received. I would expect that if thread A calls Files.move(...), moving started, thread B calls Files.move(...) while move is in progress, then B simply waits until A is finished (since the move is atomic, the file system shouldn't be able to do anything in between and just queue further moves).

@Siedlerchr
Copy link
Member Author

Siedlerchr commented Jul 18, 2020

The temporary file lock is never set, its always null because the Files... Returns directly a File Channel. (see the comment in the code in my changes).
It's sufficient to set a breakpoint in the close method, wait for both threads to arrive and then continue running.
Atomic move will only work atomic if both files are on the same file disk (at least for unix)

I will update the code to show the problem

@tobiasdiez
Copy link
Member

Good! Can you also please add a comment at the place where the exception is thrown (I don't really have time right now to play with this in detail).

@Siedlerchr
Copy link
Member Author

I tested the idea to create the file in the System's TEMP Folder, but as I expected an atomic move is not possible if the TEMP folder is on a different disk than the library file. (My Temp folder is on a SSD, while my jabref bib is on a different HDD, "Atomic move between providers is not supported)

I tested you idea with a unique file name in the same folder which results in an exception when multipe threads try to acess the file. I commented in the code where to put the breakpoint.

Locking the section could solve this, but the problem is that when the saving is done manually, it's the FX thread which executes the save operation. So that should also be a background task

1:25:40.172 [pool-6-thread-1] ERROR org.jabref.logic.autosaveandbackup.BackupManager - Error while saving to fileX:\Users\CS\Documents\_JABREFTEMP\newBiblatexTEst.bib.sav
java.nio.file.AccessDeniedException: X:\Users\CS\Documents\_JABREFTEMP\newBiblatexTEst.bib.sav112661705549979460.tmp -> X:\Users\CS\Documents\_JABREFTEMP\newBiblatexTEst.bib.sav
	at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:89) ~[?:?]
	at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) ~[?:?]
	at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:309) ~[?:?]
	at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:292) ~[?:?]
	at java.nio.file.Files.move(Files.java:1426) ~[?:?]
	at org.jabref.logic.exporter.AtomicFileOutputStream.close(AtomicFileOutputStream.java:203) ~[main/:?]
	at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:353) ~[?:?]
	at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:168) ~[?:?]
	at java.io.OutputStreamWriter.close(OutputStreamWriter.java:255) ~[?:?]
	at org.jabref.logic.exporter.BibDatabaseWriter.savePartOfDatabase(BibDatabaseWriter.java:221) ~[main/:?]
	at org.jabref.logic.exporter.BibDatabaseWriter.saveDatabase(BibDatabaseWriter.java:161) ~[main/:?]
	at org.jabref.logic.autosaveandbackup.BackupManager.performBackup(BackupManager.java:139) ~[main/:?]
	at java.util.Optional.ifPresent(Optional.java:176) ~[?:?]
	at org.jabref.logic.autosaveandbackup.BackupManager.lambda$4(BackupManager.java:164) ~[main/:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]
21:25:40.184 [JavaFX Application Thread] INFO  org.jabref.gui.JabRefDialogService - Saving library...

@tobiasdiez
Copy link
Member

Yeah, seems like we need a wrapper that controls which save operations are in progress, and if there is already one for the target file, then stops the old one and only then starts the new one... damn.

Thanks for your time investigating this!

@koppor
Copy link
Member

koppor commented Jul 21, 2020

@tobiasdiez's one works, because we use temp files.

Worker-Queue of Guava?

An ADR is required ^^.

* upstream/master:
  Simplify deployment.yml (#6703)
  Update snap.yml (#6702)
  Update snap.yml
  Fix snap (#6701)
  change snap action from obsoloete to newer (#6700)
try to combine autosave logic and normal save actions

TODO: how to handle return values of saves?
@Siedlerchr
Copy link
Member Author

Idea: put the logic into save database action

@koppor
Copy link
Member

koppor commented Jul 27, 2020

In #5669 (comment) - I put synchronized here and there (with thinking where exactly it shoulb be). Maybe, this could help here, too?

@koppor
Copy link
Member

koppor commented Jul 27, 2020

This somehow relates to #5967, but I don't know, how.

@Siedlerchr
Copy link
Member Author

Synchronizing won't help here (always a new instance created) and I already have an idea how to fix this

@calixtus
Copy link
Member

What's the status here?

@Siedlerchr
Copy link
Member Author

Work in progress. Have an idea to cancel the running or queue the new save actions

* upstream/master: (120 commits)
  Follow up fix for copy paste (#6820)
  Add CSS Customisation (#6725)
  Separate signing and notarizing (#6822)
  Remove checkstyle hack. 8.36 got released (#6816)
  Feature/enable lucene query parsing (#6799)
  New release cycle
  Bump WyriHaximus/github-action-wait-for-status from v1.1.2 to v1.2 (#6814)
  Bump mockito-core from 3.5.5 to 3.5.7 (#6813)
  Bump classgraph from 4.8.87 to 4.8.89 (#6812)
  Bump me.champeau.gradle.jmh from 0.5.0 to 0.5.1 (#6811)
  Bump checkstyle from 8.35 to 8.36 (#6810)
  Improve Changelog
  Refactor edit action (#6808)
  Fixed typo in BuildInfo (#6807)
  disable checkstyle for generated
  fix checkstyle
  Simplify check-links.yaml (markdown-link-check) (#6720)
  Rename /gen to /generate (#6800)
  Disable CSL refresh on push (#6803)
  New Crowdin updates (#6804)
  ...

# Conflicts:
#	src/main/java/org/jabref/gui/JabRefFrame.java
@github-actions
Copy link
Contributor

The JabRef maintainers will add the following name to the AUTHORS file. In case you want to use a different one, please comment here and adjust your name in your git configuration for future commits.

Nils Streijffert

prepare save action executor

TODO: How to handle return value
Wat for future?
@Siedlerchr
Copy link
Member Author

Hmmm. Then I think my queue is already sufficient. The key problem is however that we can't create a lock on a file and that there is no atomic move on all systems. So this will create problems with Dropbox or whatever?

@tobiasdiez
Copy link
Member

Yes, I think the queue would already be a huge improvement.

The only problem I can remember with our "atomic write" has been the changed file security meta data. But that was fixed as well. Do we really have reports for issues with Dropbox?

@koppor
Copy link
Member

koppor commented Apr 10, 2021

The only problem I can remember with our "atomic write" has been the changed file security meta data. But that was fixed as well. Do we really have reports for issues with Dropbox?

I think, AtomicWrite was introduced to catch for errors during file writing - and have the previous version available. Since there are no reports of corrupted data bases (at least I am not aware of), I would use the "usual" file update behavior. Reason: Sharepoint, Google Drive, Dropbox and other file sharing services IMHO rely on that (especially when it comes to proper file verisoning)

@calixtus
Copy link
Member

Maybe we can turn the action a bit inside out: before updating create a backup copy of the file and then update the current file?
But I did not work yet much on the file writer, so just a stupid idea of mine. 😅

@tobiasdiez
Copy link
Member

Since there are no reports of corrupted data bases

Preventing corrupted data is exactly the point of the atomic writer, so having no such reports is proofing that everything is working fine.

file sharing services IMHO rely on that (especially when it comes to proper file verisoning)

Can you elaborate how the atomic writer leads to problems? Since we essentially overwrite the content very quickly (delete & copy), I don't think file sharing services confuse these changes for two files - but I might be wrong.

before updating create a backup copy of the file and then update the current file

That may still leave the main library file in a corrupted state. In the best case, this means just more work for the user (copy backup over current file) and in the worse case means the user looses part of her db (when the corruption is noticed to late). But the atomic writer works in a very similar way, just interchanging the roles of "backup" and "current file": write contents to a new file, and when this succeeds replace the "current file" with this new file (which on some os is even an atomic operation).

* upstream/main:
  Main instead of master
  Custom DOI base address fix (#7569)
  Change export to save (#7518)
  Bump unoloader from 7.1.1 to 7.1.2 (#7609)
  Bump org.beryx.jlink from 2.23.5 to 2.23.6 (#7610)
  Bump com.adarshr.test-logger from 2.1.1 to 3.0.0 (#7611)
  Bump libreoffice from 7.1.1 to 7.1.2 (#7612)
  Squashed 'buildres/csl/csl-styles/' changes from e1acabe..bfa3b6d (#7603)
  Rename master to main
@Siedlerchr
Copy link
Member Author

Siedlerchr commented Apr 23, 2021

Fuck this shit. Now I'm creating a block when I wait for the save to complete. Argh.
Ideally, the save would happen in the background and the user would be informed if it was a sucess or not.
However, currently we return a boolean if it was sucess or not. This is bad.
And the cancellation also does not really work :( fuck this shit!

* upstream/main: (71 commits)
  [Bot] Update CSL styles (#7735)
  Fix for issue 6966: open all files of multiple entries (#7709)
  Add simple unit tests (#7696)
  Add simple unit tests (#7543)
  Update check-outdated-dependencies.yml
  Added preset for new entry keybindings and reintroduced defaults (#7705)
  Select the entry which has smaller dictonary order when merge (#7708)
  Update CHANGELOG.md
  fix: make more fields, fomatters, ids and languages sorted by alphabetical order (#7717)
  Bump libreoffice from 7.1.2 to 7.1.3 (#7721)
  Bump unoloader from 7.1.2 to 7.1.3 (#7724)
  Bump org.beryx.jlink from 2.23.7 to 2.23.8 (#7723)
  Bump org.openjfx.javafxplugin from 0.0.9 to 0.0.10 (#7725)
  fix: make fields sorted by lexicographical order (#7711)
  Fix tests
  Refactoring existing unit tests (#7687)
  Refactoring and addition of unit tests (#7581)
  Refactor simple Unit Tests (#7571)
  Add simple unit tests (#7544)
  add and extend unit tests (#7685)
  ...
@koppor koppor added this to the v5.6 milestone Feb 18, 2022
@koppor koppor self-assigned this Feb 18, 2022
@koppor koppor modified the milestones: v5.6, v5.7 Mar 28, 2022
@ThiloteE ThiloteE added bug Confirmed bugs or reports that are very likely to be bugs export / save labels Apr 20, 2022
@koppor koppor mentioned this pull request May 2, 2022
14 tasks
@claell
Copy link
Contributor

claell commented May 12, 2022

Possibly, this has already been discussed. But git seems to solve parallel write operations to a repository from multiple git processes with a lock file. Before the process starts, it checks, whether the lock file is there. If not, it creates it, performs the actions and removes the lock file. If the lock is there, it shows a warning.

For JabRef, one might not even need such a lock file, but just a variable that gets set when operations are performed (and finished). Do I miss something important?

@ThiloteE ThiloteE modified the milestones: v5.7, v5.8 Jul 14, 2022
@koppor koppor closed this Aug 16, 2022
@Siedlerchr Siedlerchr deleted the writebackupToTemp branch December 18, 2022 20:00
@koppor
Copy link
Member

koppor commented Jan 6, 2023

Now, JabRef has issues on the edge case WSL shares: #9547

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bugs or reports that are very likely to be bugs export / save status: changes required Pull requests that are not yet complete
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unknown exception Uncaught exception Saving file throws exception Timing problem when saving
6 participants