Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force mv in create_artifact #2292

Closed

Conversation

charleskawczynski
Copy link

Over at ClimateMachine.jl, we're seeing race conditions when using create_artifact on multiple processors (e.g., here). This is likely not the best fix but, based on this thread, it seems like it may work. Looping in @kpamnany. I'm hoping that this will partially fix ClimateMachine's 1678.

Doing this the "right way" seems ultimately dependent on resolving cross-platform file locking. Pidfile.jl seems like a promising solution, but I'm not sure what the status is on moving this into Base.

@kpamnany
Copy link
Contributor

Ah, you may have missed the subsequent comment. So, not sure this will actually help.

@charleskawczynski
Copy link
Author

charleskawczynski commented Dec 13, 2020

Ah, crud, I see. You're right. This won't fix it. What if we also passed force through from mv to rename in julia/base/file.jl? (right now force is not passed through). Even if that did work, I guess this is not really moving things in the right direction, though ☹️.

@kpamnany
Copy link
Contributor

There are many, many things depending on particular behavior of the file I/O primitives; can't really change things in Base easily.

We need to MPI I/O all our stuff, which will mean significant PRs to packages we use or specialized forks. Neither option is appealing. 🤷‍♂️

Simplest solution for now is:

If mpirank == 0
    ...
end
MPI.Barrier(MPI.COMM_WORLD)

Which is bad for performance, but should avoid distributed races.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ArtifactWrappers is not multi-process safe
2 participants