Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running julia from multiple machines sharing the same home folder #13684

Closed
iceblue25 opened this issue Oct 20, 2015 · 16 comments
Closed

Running julia from multiple machines sharing the same home folder #13684

iceblue25 opened this issue Oct 20, 2015 · 16 comments
Labels
compiler:precompilation Precompilation of modules

Comments

@iceblue25
Copy link

Hi all,

I am trying to run julia from multiple machines (bash launches multiple julia instances from various machines) and all share the same home dir.

The issue is that each machine recompiles the stale cache. (All are share the same package dir). Is this an issue? Is this a known issue? Is there a solution?

Thanks

@tkelman
Copy link
Contributor

tkelman commented Oct 20, 2015

Best to manually call using or Base.compilecache on any packages you're going to need from a single node, and wait to launch other jobs until the precompilation is finished.

@iceblue25
Copy link
Author

Seems like a messy solution.

Can precompilation happen in memory? So that I launch all processes without waiting? (Might be a big problem especially when trying to launch multiple small julia programs.)
Also launching processes sequentially requires additional logic and also delays the execution.

Can one precompile the packages for a very general cpu architecture so that is compatible with all machines?

@tkelman
Copy link
Contributor

tkelman commented Oct 20, 2015

precompilation in memory was how it used to work, and you can patch the packages you're using to turn __precompile__(false)

Alternately you can tell each worker to write to a separate local location, using independent copies of the package directory. Is this causing a real error, or a theoretical concern?

@tkelman tkelman added the compiler:precompilation Precompilation of modules label Oct 20, 2015
@iceblue25
Copy link
Author

It is a real concern but not real error yet. I am trying some toy examples before scaling things up.
No error yet but If multiple compilations happen in parallel I guess a race condition will happen.

Also, is there a way to precompile for a common cpu architecture (for example pentium 4 arch) which is common across all machines?

Thanks for all the quick replies and help!

@tkelman
Copy link
Contributor

tkelman commented Oct 20, 2015

You can start julia with -C pentium4 and it looks like that should propagate to the precompile task. Are all workers running the same build of julia? Are you seeing precompilation happening every time you run something, or just the first time?

@stevengj
Copy link
Member

You could use a different LOAD_CACHE_PATH on each machine. e.g. add the following to your .juliarc:

push!(Base.LOAD_CACHE_PATH, joinpath(Base.LOAD_CACHE_PATH[1], gethostname()))

(Though you might want to use a local, non-network path instead.) That way, the different machines won't share a cache.

@iceblue25
Copy link
Author

@tkelman and @stevengj thanks for your replies. I will try the separate caches solution. Not very clean solution but I guess it will work.

@anriseth
Copy link

I have the same issue (on my university computers). push! did not help, but unshift works:

unshift!(Base.LOAD_CACHE_PATH, joinpath(Base.LOAD_CACHE_PATH[1], gethostname()))

@stevengj
Copy link
Member

Right, sorry; you need the new path to be first, via unshift!.

@stevengj
Copy link
Member

It seems like it might be reasonable to include gethostname() in the default cache path. It's not too uncommon to share home directories between machines.

@eschnett
Copy link
Contributor

+1

@iceblue25
Copy link
Author

+1 for default cache path

@iceblue25
Copy link
Author

What about packages written in C using Pkg.build() . The builds can be machine specific

@rickedanielson
Copy link

In case there are many machines, but only a few CPU types are relevant (e.g., E5645 E5-2630 E5-2640, plus a few more that are equivalent to these three), it should help to reduce duplication by creating a few subdirs such as:

cd ~/.julia/lib/v0.4
mkdir E5645 E5-2630 E5-2640
cp -a *.ji E5-2630
cp -a *.ji E5-2640
mv *.ji E5645
ln -s E5645 X5650 ; ln -s E5-2630 E5-2650

and then in .juliarc check for the CPU (e.g., if listed in /proc/cpuinfo):

if VERSION >= v"0.4"
HostCPUName = split(readall(pipeline(cat /proc/cpuinfo, grep CPU, head -1)))[7]
unshift!(Base.LOAD_CACHE_PATH, joinpath(Base.LOAD_CACHE_PATH[1], HostCPUName))
end

PS - Just to be complete, job submission may require a corresponding executable for julia, which would point to versions compiled on the different machine types and should be a script higher in the PATH than any other "julia", of course:

#!/usr/bin/tcsh

if (grep 5645 /proc/cpuinfo | head -1 | wc -l == "1") then
${STEME}/soft/julia.E5645/julia $*
endif
if (grep 5650 /proc/cpuinfo | head -1 | wc -l == "1") then
${STEME}/soft/julia.E5645/julia $*
endif
if (grep 2630 /proc/cpuinfo | head -1 | wc -l == "1") then
${STEME}/soft/julia.E5-2650/julia $*
endif
if (grep 2640 /proc/cpuinfo | head -1 | wc -l == "1") then
${STEME}/soft/julia.E5-2640/julia $*
endif
if (grep 2650 /proc/cpuinfo | head -1 | wc -l == "1") then
${STEME}/soft/julia.E5-2650/julia $*
endif

@damiendr
Copy link
Contributor

This problem often comes up on compute clusters -- at the moment the only way I can run julia array jobs is using the compilecache=no option. It would be very nice to have a cleaner solution in the future.

@simonbyrne
Copy link
Contributor

Should be fixed by #36416.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:precompilation Precompilation of modules
Projects
None yet
Development

No branches or pull requests

8 participants