Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass force=true option to rename in case of loading failure in compilecache #36638

Merged
merged 1 commit into from
Jul 16, 2020

Conversation

musm
Copy link
Contributor

@musm musm commented Jul 13, 2020

This fixes #36621 for me

What's happening is that the rename via jl_fs_rename fails and then we have a cp failure since force=true is not passed.

err = ccall(:jl_fs_rename, Int32, (Cstring, Cstring), src, dst)
# on error, default to cp && rm
if err < 0
# force: is already done in the mv function
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what is meant by this comment? I think this looks like a very old vestige, when this function got split?

@@ -1338,7 +1338,7 @@ function compilecache(pkg::PkgId, path::String)
chmod(tmppath, filemode(path) & 0o777)

# this is atomic according to POSIX:
rename(tmppath, cachefile)
rename(tmppath, cachefile; force=true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could simply be changed to mv(tmppath, cachefile; force = true), and then the changes to rename() are unnecessary.

But I'm not too happy with this solution because the reason this was added in the first place was for distributed execution on a shared filesystem, so that processes wouldn't error out because of racing to create the cache. With this fix, every process will overwrite the cache file -- the intended behavior was for all the processes that lost the race to silently fail to rename their temporary cache to the primary cache.

I was thinking of adding the following before the finally:

catch e
    if !isa(e, ArgumentError)
        throw(e)
    end

Or something like that but it doesn't matter much I suppose; we don't save on the system calls with the intended way and rename is atomic so it shouldn't break anything.

Any thoughts on this @KristofferC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kpamnany Note the current error in the linked issue does not involve distributed execution. The error crops up upon any recompilation of packages, which makes it impossible to use Pkg on master without manually deleting the procompilation files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand. The swiftest fix is to change the rename to mv with force = true as I said.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, might be best if you could open a PR to discuss your proposed improvement over this stop-gap solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On inspection, mv() uses checkfor_mv_cp_cptree() before rename() and that function does an rm(), thus destroying the atomicity of rename() which is what we wanted in the first place. So your fix here is good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to merge this in the meantime because I can't currently develop and use Julia packages on master with this bug.

It's probably worthwhile to explore something like in #36638 (comment) in a separate PR

@musm musm changed the title Pass force option to mv in case of loading failure Pass force=true option to rename in case of loading failure in compilecache Jul 16, 2020
@musm musm merged commit 353e129 into JuliaLang:master Jul 16, 2020
@musm musm deleted the rename branch July 16, 2020 17:38
@KristofferC KristofferC mentioned this pull request Aug 3, 2020
25 tasks
KristofferC pushed a commit that referenced this pull request Aug 3, 2020
…ecache (#36638)

This seems to be  affecting Windows systems in particular

(cherry picked from commit 353e129)
simeonschaub pushed a commit to simeonschaub/julia that referenced this pull request Aug 11, 2020
…ecache (JuliaLang#36638)

This seems to be  affecting Windows systems in particular
KristofferC pushed a commit that referenced this pull request Aug 19, 2020
…ecache (#36638)

This seems to be  affecting Windows systems in particular

(cherry picked from commit 353e129)
vtjnash pushed a commit that referenced this pull request Aug 8, 2024
As noted in #41584 and
https://discourse.julialang.org/t/safe-overwriting-of-files/117758/3
`mv` is usually expected to be "best effort atomic".

Currently calling `mv` with `force=true` calls
`checkfor_mv_cp_cptree(src, dst, "moving"; force=true)` before renaming.
`checkfor_mv_cp_cptree` will delete `dst` if exists and isn't the same
as `src`.

If `dst` is an existing file and julia stops after deleting `dst` but
before doing the rename, `dst` will be removed but will not be replaced
with `src`.

This PR changes `mv` with `force=true` to first try rename, and only
delete `dst` if that fails. Assuming file system support and the first
rename works, julia stopping will not lead to `dst` being removed
without being replaced.

This also replaces a stopgap solution from
#36638 (comment)
lazarusA pushed a commit to lazarusA/julia that referenced this pull request Aug 17, 2024
…ng#55384)

As noted in JuliaLang#41584 and
https://discourse.julialang.org/t/safe-overwriting-of-files/117758/3
`mv` is usually expected to be "best effort atomic".

Currently calling `mv` with `force=true` calls
`checkfor_mv_cp_cptree(src, dst, "moving"; force=true)` before renaming.
`checkfor_mv_cp_cptree` will delete `dst` if exists and isn't the same
as `src`.

If `dst` is an existing file and julia stops after deleting `dst` but
before doing the rename, `dst` will be removed but will not be replaced
with `src`.

This PR changes `mv` with `force=true` to first try rename, and only
delete `dst` if that fails. Assuming file system support and the first
rename works, julia stopping will not lead to `dst` being removed
without being replaced.

This also replaces a stopgap solution from
JuliaLang#36638 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ArgumentError due to compiled cache file clashing during cp operation
4 participants