-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallel bug in 0.35 release #10085
Comments
after search code , I think this issue is maybe casued by gc bug ??? |
Can you do a git bisect on the release-0.3 branch to determine exactly which commit introduced this? |
as @tkelman suggest I try hunt the bug,now found commit 06d01c2 cause this problem.before it everything is ok. with this commit simple code will crash it: library(rjulia)
julia_init()
julia_void_eval("versioninfo()") but before this stress test is pass like this: library(rjulia)
julia_init()
julia_void_eval("versioninfo()")
julia_void_eval("addprocs(2)")
for (j in 1:1000)
{
for (i in 2:3)
{
julia_void_eval(paste("r=remotecall(",i,", rand, 2, 2)",sep = ""))
y <- j2r("fetch(r)")
cat("process",j, i, "got value:\n");
print(y)
cat("\n")
}
}
julia_void_eval("rmprocs(workers())")
cat("done\n")
|
cc @vtjnash |
Very interesting. Thank you for running the bisect @armgong. Such a useful tool, |
That particular commit was fixing a different bug, so I'm not sure if just reverting it is the correct solution. It's possible that RJulia might be doing something wrong here with respect to tasks or initialization, but I'll wait for Jameson to weigh in. |
if RJulia doing something wrong,it might be in julia_init() ,it only do three thing on julia 0.33 add last two line test run ok,if no them can't start worker process. |
#9461 will likely fix this when merged |
Is that backportable? It isn't even on master yet, would be good to get it merged sooner rather than later to give it time to be well-tested before backporting into 0.3.6. |
That pr needs to be rewritten for the new gc, so that's not really an available option to merge to master first |
Ah right. It doesn't cherry-pick cleanly to 0.3 either, does it need to be rewritten twice then? |
there's actually a couple of alternative ways of fixing this also. the undocumented issue is that you can't call a julia function from a higher stack frame than the one it created in |
Oh dear. #9266 caused lots of problems, that seems really questionable for backporting. I'd be in favor of a less drastic modification on release-0.3 if possible. |
the less drastic measure is to record the |
I see. Is that doable without unfixing #8551, or do we have to choose between trading one bug for another, or backporting a change that caused some still-unresolved issues? |
oh right, not really -- #9266 was the implementation of what i just re-described above |
@vtjnash after further test, 0.33 and 0.4 both have problems but different ,just enlarge I to torture julia worker process and rjulia. test code is: library(rjulia)
julia_init()
julia_void_eval("versioninfo()")
julia_void_eval("addprocs(2)")
for (j in 1:5015)
{
for (i in 2:3)
{
julia_void_eval(paste("r=remotecall(",i,", rand, 2, 2)",sep = ""))
y <- j2r("fetch(r)")
cat("process",j, i, "got value:\n");
print(y)
cat("\n")
}
}
julia_void_eval("rmprocs(workers())")
cat("done\n") ********* 0.4 log ,attention under 0.4 just julia worker processes crashed, R process still ok ************* signal (11): Segmentation fault signal (11): Segmentation fault ArgumentErrorWorker 3 terminated.( ProcessExitedException() ************* 0.3 log ,attention under 0.3 , R process core dump ************* process 4632 3 got value: *** stack smashing detected ***: /usr/lib64/R/bin/exec/R terminated signal (6): Aborted |
ok, 0.4 problem not related to julia embedded, in julia 0.4 REPL run for i=1:10000 r=remotecall(2,rand,2,2);fetch(r);r=remotecall(3,rand,2,2);fetch(r) end also lead same issue , julia worker dead but julia head still alive julia> for i=1:10000 r=remotecall(2,rand,2,2);fetch(r);r=remotecall(3,rand,2,2);fetch(r) end signal (11): Segmentation fault deserialize_tuple at serialize.jl:355 and julia 0.35 run this code just ok |
for i=1:10000 r=remotecall(2,rand,2,2);fetch(r);r=remotecall(3,rand,2,2);fetch(r) end also fail on julia REPL windows x86_64 0.4 master branch with message F:>julia\usr\bin\julia -p 2 julia> for i=1:10000 r=remotecall(2,rand,2,2);fetch(r);r=remotecall(3,rand,2,2); |
Just for reference - see the comments on f769e41, I don't plan on backporting the fix for this myself because it has messy conflicts that I don't know how to resolve. If anyone would like to see this resolved for 0.3.7 or later, please prepare a PR against release-0.3. |
I wrtie rjulia package call julia from R, the following R code run correctly on julia 0.3.3 and 0.4,but fail on 0.35.
on 0.35 show following error
The text was updated successfully, but these errors were encountered: