Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when updating cache #14118

Closed
eschnett opened this issue Nov 24, 2015 · 5 comments
Closed

Race condition when updating cache #14118

eschnett opened this issue Nov 24, 2015 · 5 comments
Labels
compiler:precompilation Precompilation of modules parallelism Parallel or distributed computation

Comments

@eschnett
Copy link
Contributor

I am running Julia in parallel, i.e. I am starting several Julia interpreters simultaneously. Usually this works fine. When a package is outdated, both will recompile it.

However, I just encountered this error. It went away when I tried again, so I assume it is a race condition.

This is with the release branch of Julia 0.4.

$ mpirun -np 2 ~/julia/bin/julia 06-cman-transport.jl MPI
INFO: Recompiling stale cache file /Users/eschnett/.julia/lib/v0.4/MPI.ji for module MPI.
INFO: Recompiling stale cache file /Users/eschnett/.julia/lib/v0.4/MPI.ji for module MPI.
ERROR: LoadError: unlink: no such file or directory (ENOENT)
 in unlink at fs.jl:102
 in rm at file.jl:59
 in create_expr_cache at loading.jl:330
 in recompile_stale at loading.jl:461
 in _require_from_serialized at loading.jl:83
 in _require_from_serialized at /Users/eschnett/julia/lib/julia/sys.dylib
 in require at /Users/eschnett/julia/lib/julia/sys.dylib
 in include at /Users/eschnett/julia/lib/julia/sys.dylib
 in include_from_node1 at /Users/eschnett/julia/lib/julia/sys.dylib
 in process_options at /Users/eschnett/julia/lib/julia/sys.dylib
 in _start at /Users/eschnett/julia/lib/julia/sys.dylib
while loading /Users/eschnett/.julia/v0.4/MPI/examples/06-cman-transport.jl, in expression starting on line 1
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------

The standard way to handle this in Unix is to write to a temporary file ($output.tmp), and then use an atomic rename to update the cache (mv("$output.tmp", output)). Removing the file beforehand is not necessary, and can't be done safely unless one locks the directory.

@ViralBShah
Copy link
Member

I was expecting this one to strike - but not so soon!

@ViralBShah ViralBShah added the compiler:precompilation Precompilation of modules label Nov 24, 2015
@kshyatt kshyatt added the parallelism Parallel or distributed computation label Nov 24, 2015
@tkelman
Copy link
Contributor

tkelman commented Nov 24, 2015

Essentially a duplicate of #13684. Force a precompile manually if you want to use a package in parallel.

We already are creating .ji files atomically, ref #12699.

Also ref #12723 which doesn't have any great way of knowing whether other instances of Julia happen to be in the process of creating the same file.

@StefanKarpinski
Copy link
Sponsor Member

Can we use discretionary file locking?

@eschnett
Copy link
Contributor Author

#13684 is different -- that is about machine-specific caches, whereas this issue here is about an access conflict.

I'm not sure what #12699 addresses. The issue here is caused by files not being generated atomically; maybe the solution to #12699 decayed over time? At the moment, this happens sequentially:

  • check whether a file exists
  • if so, delete it
  • open the file, truncating it
  • write to the file

The error occurs if the first two actions overlap, since the file can't be deleted twice. The other race -- two processes writing to the same file -- could lead to silent corruption.

I'm working on a solution in #14143.

@tkelman
Copy link
Contributor

tkelman commented Nov 29, 2015

closed by #14145

@tkelman tkelman closed this as completed Nov 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:precompilation Precompilation of modules parallelism Parallel or distributed computation
Projects
None yet
Development

No branches or pull requests

5 participants