Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry artifact rename if it fails #4001

Merged
merged 2 commits into from
Aug 31, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 20 additions & 4 deletions src/Artifacts.jl
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,18 @@ Either rename the directory at `temp_dir` to `new_path` and set it to read-only
or if `new_path` artifact already exists try to do nothing.
"""
function _mv_temp_artifact_dir(temp_dir::String, new_path::String)::Nothing
if !isdir(new_path)
# Sometimes a rename can fail because the temp_dir is locked by
# anti-virus software scanning the new files.
# In this case we want to sleep and try again.
# I am using the list of error codes to retry from:
# https://github.com/isaacs/node-graceful-fs/blob/234379906b7d2f4c9cfeb412d2516f42b0fb4953/polyfills.js#L87
# Retry for up to about 60 seconds by retrying 20 times with exponential backoff.
retry = 0
max_num_retries = 20 # maybe this should be configurable?
sleep_amount = 0.01 # seconds
max_sleep_amount = 5.0 # seconds
while true
isdir(new_path) && return
# This next step is like
# `mv(temp_dir, new_path)`.
# However, `mv` defaults to `cp` if `rename` returns an error.
Expand All @@ -75,14 +86,19 @@ function _mv_temp_artifact_dir(temp_dir::String, new_path::String)::Nothing
# rename worked
chmod(new_path, filemode(dirname(new_path)))
set_readonly(new_path)
return
else
# Ignore rename error, if `new_path` exists.
if !isdir(new_path)
# Ignore rename error if `new_path` exists.
isdir(new_path) && return
if retry < max_num_retries && err ∈ (Base.UV_EACCES, Base.UV_EPERM, Base.UV_EBUSY)
sleep(sleep_amount)
sleep_amount = min(sleep_amount*2.0, max_sleep_amount)
retry += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this can take up to 60s, I think we should print something out to the user. Something like:

# Warn once, and only if we've slept a non-trivial amount of time already
if retry == 8
    @warn("Artifact installation failed, retrying for up to 60s.  Check to see if an anti-virus may be interfering.")
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is hit during the new parallel artifact download process, then outputting a log will mess with the multiprogress printing (sorry). We'll need some way to keep that tidy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
How about adding other status messages to the right of the progress bar where it currently has "XX MiB/XX MiB" like "unpacking", "verifying tree hash", and "installing"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the location works but in the default happy path I wouldn't show any further detail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So only show a status message if an artifact is stuck at a stage for over 1 second?

To do the actual printing, can I add a status::String field to

Base.@kwdef mutable struct MiniProgressBar

that gets appended to progress_text in?

progress_text = if p.mode == :percentage

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good but I'd do it such that if status is non empty just show the status string after the name and not the progress bar or ending ratio. Just so that we're more likely to fit in the terminal width.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4005 has landed. Thanks @nhz2. So I believe this is good to go now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to backport this to 1.10 & 1.11, but when we do we'll want to add a log message about it taking a long time, as suggested above, given the parallel download thing isn't going to be backported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to the time it already takes to download and unpack artifacts on Windows with an aggressive anti-virus I don't think one minute extra is something to create a warning about, as long as the artifact eventually gets installed.

else
Base.uv_error("rename of $(repr(temp_dir)) to $(repr(new_path))", err)
end
end
end
nothing
end

"""
Expand Down
Loading