Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel bug in 0.35 release #10085

Closed
armgong opened this issue Feb 5, 2015 · 22 comments
Closed

parallel bug in 0.35 release #10085

armgong opened this issue Feb 5, 2015 · 22 comments
Labels
parallelism Parallel or distributed computation regression Regression in behavior compared to a previous version

Comments

@armgong
Copy link
Contributor

armgong commented Feb 5, 2015

I wrtie rjulia package call julia from R, the following R code run correctly on julia 0.3.3 and 0.4,but fail on 0.35.

library(rjulia)
julia_init()
julia_eval("addprocs(1)")
for (i in 1:2)
{
  julia_void_eval(paste("r=remotecall(",i,", rand, 2, 2)",sep = ""))
  y <- j2r("fetch(r)")
  cat("\n")
  cat("process", i, "got value:\n"); 
  print(y)
}
julia_void_eval("rmprocs(workers())")

on 0.35 show following error

> library(rjulia)
> julia_init()
> julia_eval("addprocs(1)")
NULL
> for (i in 1:2)
+ {
+   julia_void_eval(paste("r=remotecall(",i,", rand, 2, 2)",sep = ""))
+   y <- j2r("fetch(r)")
+   cat("\n")
+   cat("process", i, "got value:\n");
+   print(y)
+ }

process 1 got value:
          [,1]      [,2]
[1,] 0.5080583 0.5226671
[2,] 0.3843354 0.6072951
fatal: error thrown and no exception handler available.
MemoryError()

signal (11): Segmentation fault
unknown function (ip: -1616141115)
unknown function (ip: -1615757548)
unknown function (ip: -1616127477)
unknown function (ip: -1618661289)
unknown function (ip: -1618550229)
unknown function (ip: -1618537003)
unknown function (ip: -1618576727)
unknown function (ip: -1616064465)
unknown function (ip: -1616064242)
unknown function (ip: -1616064028)
unknown function (ip: -1624995090)
unknown function (ip: -1624994754)
jl_trampoline at /data/julia0.3/julia/usr/lib/libjulia.so (unknown line)
jl_apply_generic at /data/julia0.3/julia/usr/lib/libjulia.so (unknown line)
jl_f_apply at /data/julia0.3/julia/usr/lib/libjulia.so (unknown line)
julia_rmprocs_20121 at  (unknown line)
terminate_all_workers at ./multi.jl:1614
jlcall_terminate_all_workers_18781 at /data/julia0.3/julia/usr/bin/../lib/julia/sys.so (unknown line)
jl_apply_generic at /data/julia0.3/julia/usr/lib/libjulia.so (unknown line)
_atexit at ./client.jl:423
jlcall__atexit_18051 at /data/julia0.3/julia/usr/bin/../lib/julia/sys.so (unknown line)
jl_apply_generic at /data/julia0.3/julia/usr/lib/libjulia.so (unknown line)
uv_atexit_hook at /data/julia0.3/julia/usr/lib/libjulia.so (unknown line)
jl_exit at /data/julia0.3/julia/usr/lib/libjulia.so (unknown line)
unknown function (ip: -1624911487)
unknown function (ip: -1624911380)
unknown function (ip: -1624920120)
unknown function (ip: -1547017712)
unknown function (ip: 33247064)
@armgong
Copy link
Contributor Author

armgong commented Feb 5, 2015

after search code , I think this issue is maybe casued by gc bug ???

@ViralBShah ViralBShah added the parallelism Parallel or distributed computation label Feb 5, 2015
@jiahao jiahao added the regression Regression in behavior compared to a previous version label Feb 5, 2015
@tkelman
Copy link
Contributor

tkelman commented Feb 5, 2015

Can you do a git bisect on the release-0.3 branch to determine exactly which commit introduced this?

@armgong
Copy link
Contributor Author

armgong commented Feb 6, 2015

as @tkelman suggest I try hunt the bug,now found commit 06d01c2 cause this problem.before it everything is ok. with this commit simple code will crash it:

library(rjulia)
julia_init()
julia_void_eval("versioninfo()")

but before this stress test is pass like this:

library(rjulia)
julia_init()
julia_void_eval("versioninfo()")
julia_void_eval("addprocs(2)")
for (j in 1:1000)
{
for (i in 2:3)
{
  julia_void_eval(paste("r=remotecall(",i,", rand, 2, 2)",sep = ""))
  y <- j2r("fetch(r)")
  cat("process",j, i, "got value:\n"); 
  print(y) 
  cat("\n")
}
}
julia_void_eval("rmprocs(workers())")
cat("done\n")

@armgong
Copy link
Contributor Author

armgong commented Feb 6, 2015

after revert commit 06d01c2 on release-0.3 branch head,everything run like a charm,so julia core dev please revert 06d01c2

@jiahao
Copy link
Member

jiahao commented Feb 6, 2015

cc @vtjnash

@tkelman
Copy link
Contributor

tkelman commented Feb 6, 2015

Very interesting. Thank you for running the bisect @armgong. Such a useful tool, git bisect.

@tkelman
Copy link
Contributor

tkelman commented Feb 6, 2015

That particular commit was fixing a different bug, so I'm not sure if just reverting it is the correct solution. It's possible that RJulia might be doing something wrong here with respect to tasks or initialization, but I'll wait for Jameson to weigh in.

@armgong
Copy link
Contributor Author

armgong commented Feb 6, 2015

if RJulia doing something wrong,it might be in julia_init() ,it only do three thing
1 jl_init(JULIA_HOME)
2 jl_eval_string("Base.init_parallel()")
3 jl_eval_string("Base.init_bind_addr(ARGS)")
because julia 0.3 don't call init parallel function in jlapi.c jl_init_with_image,so we need last two line . 0.4 already have them in jlapi.c jl_init_with_image.

on julia 0.33 add last two line test run ok,if no them can't start worker process.
on julia 0.40 also add last two line (though don't need them since already in jl_init_with_image) test run ok
on julia 0.35 no last two line can't start worker process ,add last two line test run into crash.

@vtjnash
Copy link
Member

vtjnash commented Feb 6, 2015

#9461 will likely fix this when merged

@tkelman
Copy link
Contributor

tkelman commented Feb 6, 2015

Is that backportable? It isn't even on master yet, would be good to get it merged sooner rather than later to give it time to be well-tested before backporting into 0.3.6.

@vtjnash
Copy link
Member

vtjnash commented Feb 6, 2015

That pr needs to be rewritten for the new gc, so that's not really an available option to merge to master first

@tkelman
Copy link
Contributor

tkelman commented Feb 6, 2015

Ah right. It doesn't cherry-pick cleanly to 0.3 either, does it need to be rewritten twice then?

@vtjnash
Copy link
Member

vtjnash commented Feb 7, 2015

there's actually a couple of alternative ways of fixing this also. the undocumented issue is that you can't call a julia function from a higher stack frame than the one it created in julia_init

@vtjnash
Copy link
Member

vtjnash commented Feb 7, 2015

actually, i missed that you said this worked on 0.4. that means #9461 isn't really necessary. what is more necessary is finishing cherry-picking the other changes such as 54affdb. or call the JL_SET_STACK_BASE from some function that doesn't return (in 0.4, this is rolled into julia_init)

@tkelman
Copy link
Contributor

tkelman commented Feb 7, 2015

Oh dear. #9266 caused lots of problems, that seems really questionable for backporting. I'd be in favor of a less drastic modification on release-0.3 if possible.

@vtjnash
Copy link
Member

vtjnash commented Feb 7, 2015

the less drastic measure is to record the jl_stack_base value in every jl_task_t, as we did before 06d01c2, during the first call to save_stack (for any task != jl_root_task). the restore_stack function assumes that value is const after that point, whereas jl_eval_string will change it if it was previously undefined (due to a lack of call to JL_SET_STACK_BASE)

@tkelman
Copy link
Contributor

tkelman commented Feb 7, 2015

I see. Is that doable without unfixing #8551, or do we have to choose between trading one bug for another, or backporting a change that caused some still-unresolved issues?

@vtjnash
Copy link
Member

vtjnash commented Feb 7, 2015

oh right, not really -- #9266 was the implementation of what i just re-described above

@armgong
Copy link
Contributor Author

armgong commented Feb 7, 2015

@vtjnash after further test, 0.33 and 0.4 both have problems but different ,just enlarge I to torture julia worker process and rjulia.

test code is:

library(rjulia)
julia_init()
julia_void_eval("versioninfo()")
julia_void_eval("addprocs(2)")
for (j in 1:5015)
{
for (i in 2:3)
{
  julia_void_eval(paste("r=remotecall(",i,", rand, 2, 2)",sep = ""))
  y <- j2r("fetch(r)")
  cat("process",j, i, "got value:\n"); 
  print(y) 
  cat("\n")
}
}
julia_void_eval("rmprocs(workers())")
cat("done\n")

********* 0.4 log ,attention under 0.4 just julia worker processes crashed, R process still ok *************

signal (11): Segmentation fault
unknown function (ip: 624957107)
unknown function (ip: 624965742)
unknown function (ip: 624972752)
jl_gc_collect at /data/julia/usr/bin/../lib/libjulia.so (unknown line)
unknown function (ip: 624983543)
jl_alloc_tuple_uninit at /data/julia/usr/bin/../lib/libjulia.so (unknown line)
jl_f_tuple at /data/julia/usr/bin/../lib/libjulia.so (unknown line)
jl_f_apply at /data/julia/usr/bin/../lib/libjulia.so (unknown line)
ntuple at ./tuple.jl:30
....(lot of ntuple at ./tuple.jl:30 )
ntuple at ./tuple.jl:30

signal (11): Segmentation fault
ntuple at ./tuple.jl:30
....(lot of ntuple at ./tuple.jl:30 )
ntuple at ./tuple.jl:30
deserialize_tuple at serialize.jl:355
handle_deserialize at serialize.jl:350
anonymous at task.jl:855
unknown function (ip: 624884065)
unknown function (ip: 0)
unknown function (ip: -1397220928)
unknown function (ip: -1397213074)
unknown function (ip: -1397206064)
jl_gc_collect at /data/julia/usr/bin/../lib/libjulia.so (unknown line)
unknown function (ip: -1397195273)
jl_alloc_tuple_uninit at /data/julia/usr/bin/../lib/libjulia.so (unknown line)
jl_f_tuple at /data/julia/usr/bin/../lib/libjulia.so (unknown line)
jl_f_apply at /data/julia/usr/bin/../lib/libjulia.so (unknown line)
ntuple at ./tuple.jl:30
....(lot of ntuple at ./tuple.jl:30 )
ntuple at ./tuple.jl:30
Worker 2 terminated.ProcessExitedException
()
process 4940 2 got value:
NULL

ArgumentErrorWorker 3 terminated.(
"stream is closed or unusable")
ProcessExitedException()
process 4940 3 got value:
NULL

ProcessExitedException()
ProcessExitedException()
process 4941 2 got value:
NULL

************* 0.3 log ,attention under 0.3 , R process core dump *************

process 4632 3 got value:
[,1] [,2]
[1,] 0.3489215 0.3766013
[2,] 0.7847451 0.8211198

*** stack smashing detected ***: /usr/lib64/R/bin/exec/R terminated
======= Backtrace: =========
/usr/lib/libc.so.6(+0x732ae)[0x7f88025232ae]
/usr/lib/libc.so.6(__fortify_fail+0x37)[0x7f88025a8907]
/usr/lib/libc.so.6(__fortify_fail+0x0)[0x7f88025a88d0]
/usr/lib64/R/lib/libR.so(Rf_applyClosure+0x6f2)[0x7f8802b44cc2]
/usr/lib64/R/lib/libR.so(Rf_eval+0x341)[0x7f8802b3eaa1]
/usr/lib64/R/lib/libR.so(+0xd1dd0)[0x7f8802b40dd0]
/usr/lib64/R/lib/libR.so(Rf_eval+0x534)[0x7f8802b3ec94]
/usr/lib64/R/lib/libR.so(+0xd44dd)[0x7f8802b434dd]
/usr/lib64/R/lib/libR.so(Rf_eval+0x534)[0x7f8802b3ec94]
/usr/lib64/R/lib/libR.so(+0xd1dd0)[0x7f8802b40dd0]
/usr/lib64/R/lib/libR.so(Rf_eval+0x534)[0x7f8802b3ec94]
/usr/lib64/R/lib/libR.so(+0xd44dd)[0x7f8802b434dd]
/usr/lib64/R/lib/libR.so(Rf_eval+0x534)[0x7f8802b3ec94]
/usr/lib64/R/lib/libR.so(Rf_ReplIteration+0x252)[0x7f8802b677d2]
/usr/lib64/R/lib/libR.so(+0xf8b31)[0x7f8802b67b31]
/usr/lib64/R/lib/libR.so(run_Rmainloop+0x44)[0x7f8802b68084]
/usr/lib64/R/bin/exec/R(main+0x1b)[0x40082b]
/usr/lib/libc.so.6(__libc_start_main+0xf0)[0x7f88024d0040]
/usr/lib64/R/bin/exec/R[0x40085b]
======= Memory map: ========
00400000-00401000 r-xp 00000000 08:01 13240744 /usr/lib/R/bin/exec/R
00600000-00601000 r--p 00000000 08:01 13240744 /usr/lib/R/bin/exec/R
00601000-00602000 rw-p 00001000 08:01 13240744 /usr/lib/R/bin/exec/R
0238e000-0d3a3000 rw-p 00000000 00:00 0 [heap]
7f87c0000000-7f87c0021000 rw-p 00000000 00:00 0
7f87c0021000-7f87c4000000 ---p 00000000 00:00 0
7f87c8000000-7f87c8021000 rw-p 00000000 00:00 0
7f87c8021000-7f87cc000000 ---p 00000000 00:00 0
7f87cfd77000-7f87cfd78000 ---p 00000000 00:00 0
7f87cfd78000-7f87d0578000 rw-p 00000000 00:00 0 [stack:8359]
7f87d0578000-7f87d0579000 ---p 00000000 00:00 0
7f87d0579000-7f87d0d79000 rw-p 00000000 00:00 0 [stack:8358]
7f87d0d79000-7f87d0d7a000 ---p 00000000 00:00 0
7f87d0d7a000-7f87d157a000 rw-p 00000000 00:00 0 [stack:8357]
7f87d157a000-7f87d157b000 ---p 00000000 00:00 0
7f87d157b000-7f87d1d7b000 rw-p 00000000 00:00 0 [stack:8356]
7f87d1d7b000-7f87d1d9f000 r-xp 00000000 08:11 1052567 /data/julia0.3/0.33/lib/libopenlibm.so.1.0
7f87d1d9f000-7f87d1f9f000 ---p 00024000 08:11 1052567 /data/julia0.3/0.33/lib/libopenlibm.so.1.0
7f87d1f9f000-7f87d1fa0000 rw-p 00024000 08:11 1052567 /data/julia0.3/0.33/lib/libopenlibm.so.1.0
7f87d1fa0000-7f87d1ffa000 r-xp 00000000 08:11 1052937 /data/julia0.3/0.33/lib/libmpfr.so.4.1.2
7f87d1ffa000-7f87d21fa000 ---p 0005a000 08:11 1052937 /data/julia0.3/0.33/lib/libmpfr.so.4.1.2
7f87d21fa000-7f87d21fc000 rw-p 0005a000 08:11 1052937 /data/julia0.3/0.33/lib/libmpfr.so.4.1.2
7f87d21fc000-7f87d2267000 r-xp 00000000 08:11 1052924 /data/julia0.3/0.33/lib/libgmp.so.10.1.3
7f87d2267000-7f87d2467000 ---p 0006b000 08:11 1052924 /data/julia0.3/0.33/lib/libgmp.so.10.1.3
7f87d2467000-7f87d2470000 rw-p 0006b000 08:11 1052924 /data/julia0.3/0.33/lib/libgmp.so.10.1.3
7f87d2470000-7f87d2473000 r-xp 00000000 08:11 1052623 /data/julia0.3/0.33/lib/libdSFMT.so
7f87d2473000-7f87d2673000 ---p 00003000 08:11 1052623 /data/julia0.3/0.33/lib/libdSFMT.so
7f87d2673000-7f87d2674000 rw-p 00003000 08:11 1052623 /data/julia0.3/0.33/lib/libdSFMT.so
7f87d2674000-7f87d4674000 rw-p 00000000 00:00 0
7f87d4674000-7f87d46d2000 r-xp 00000000 08:11 1052861 /data/julia0.3/0.33/lib/libpcre.so.1.0.1
7f87d46d2000-7f87d48d1000 ---p 0005e000 08:11 1052861 /data/julia0.3/0.33/lib/libpcre.so.1.0.1
7f87d48d1000-7f87d48d2000 rw-p 0005d000 08:11 1052861 /data/julia0.3/0.33/lib/libpcre.so.1.0.1
7f87d48d2000-7f87da8d2000 rw-p 00000000 00:00 0
7f87da8d2000-7f87da8d3000 ---p 00000000 00:00 0
7f87da8d3000-7f87db0d3000 rw-p 00000000 00:00 0 [stack:8339]
7f87db0d3000-7f87df0d3000 rw-p 00000000 00:00 0
7f87df0d3000-7f87df0d4000 ---p 00000000 00:00 0
7f87df0d4000-7f87df8d4000 rw-p 00000000 00:00 0 [stack:8338]
7f87df8d4000-7f87e18d4000 rw-p 00000000 00:00 0
7f87e18d4000-7f87e18d5000 ---p 00000000 00:00 0
7f87e18d5000-7f87e20d5000 rw-p 00000000 00:00 0 [stack:8337]
7f87e20d5000-7f87e60d5000 rw-p 00000000 00:00 0
7f87e60d5000-7f87e60d6000 ---p 00000000 00:00 0
7f87e60d6000-7f87e68d6000 rw-p 00000000 00:00 0 [stack:8336]
7f87e68d6000-7f87e68d7000 ---p 00000000 00:00 0
7f87e68d7000-7f87e70d7000 rw-p 00000000 00:00 0 [stack:8335]
7f87e70d7000-7f87e70d8000 ---p 00000000 00:00 0
7f87e70d8000-7f87e78d8000 rw-p 00000000 00:00 0 [stack:8334]
7f87e78d8000-7f87e78d9000 ---p 00000000 00:00 0
7f87e78d9000-7f87e80d9000 rw-p 00000000 00:00 0 [stack:8333]
7f87e80d9000-7f87ee0d9000 rw-p 00000000 00:00 0
7f87ee0d9000-7f87ee0da000 ---p 00000000 00:00 0
7f87ee0da000-7f87ee8da000 rw-p 00000000 00:00 0 [stack:8332]
7f87ee8da000-7f87f08da000 rw-p 00000000 00:00 0
7f87f08da000-7f87f08db000 ---p 00000000 00:00 0
7f87f08db000-7f87f10db000 rw-p 00000000 00:00 0 [stack:8331]
7f87f10db000-7f87f50db000 rw-p 00000000 00:00 0
7f87f50db000-7f87f50dc000 ---p 00000000 00:00 0
7f87f50dc000-7f87f58dc000 rw-p 00000000 00:00 0 [stack:8330]
7f87f58dc000-7f87f78dc000 rw-p 00000000 00:00 0
7f87f78dc000-7f87f78dd000 ---p 00000000 00:00 0
7f87f78dd000-7f87f80dd000 rw-p 00000000 00:00 0 [stack:8329]
7f87f80dd000-7f87f80de000 ---p 00000000 00:00 0
7f87f80de000-7f87f88de000 rw-p 00000000 00:00 0 [stack:8328]
7f87f88de000-7f87f88df000 ---p 00000000 00:00 0
7f87f88df000-7f87f90df000 rw-p 00000000 00:00 0 [stack:8327]
7f87f90df000-7f87f90e0000 ---p 00000000 00:00 0
7f87f90e0000-7f87f98e0000 rw-p 00000000 00:00 0 [stack:8326]
7f87f98e0000-7f87f98e1000 ---p 00000000 00:00 0
7f87f98e1000-7f87fa0e1000 rw-p 00000000 00:00 0 [stack:8325]
7f87fa0e1000-7f87fbf7e000 r-xp 00000000 08:11 1052751 /data/julia0.3/0.33/lib/libopenblas.so
7f87fbf7e000-7f87fc17d000 ---p 01e9d000 08:11 1052751 /data/julia0.3/0.33/lib/libopenblas.so
7f87fc17d000-7f87fc19a000 rw-p 01e9c000 08:11 1052751 /data/julia0.3/0.33/lib/libopenblas.so
7f87fc19a000-7f87fc2ae000 rw-p 00000000 00:00 0
7f87fc30c000-7f87fcaae000 rw-p 00000000 00:00 0
7f87fcaae000-7f87fce90000 r-xp 00000000 08:11 1053030 /data/julia0.3/0.33/lib/julia/sys.so
7f87fce90000-7f87fd08f000 ---p 003e2000 08:11 1053030 /data/julia0.3/0.33/lib/julia/sys.so
7f87fd08f000-7f87fd0c0000 rw-p 003e1000 08:11 1053030 /data/julia0.3/0.33/lib/julia/sys.so
7f87fd0c0000-7f87fd0df000 rw-p 00000000 00:00 0
7f87fd2e0000-7f87fd360000 rwxp 00000000 00:00 0
7f87fd360000-7f87fd561000 rw-p 00000000 00:00 0
7f87fd561000-7f87fd568000 r-xp 00000000 08:01 2099261 /home/armgong/R/x86_64-unknown-linux-gnu-library/3.1/rjulia/libs/rjulia.so
7f87fd568000-7f87fd767000 ---p 00007000 08:01 2099261 /home/armgong/R/x86_64-unknown-linux-gnu-library/3.1/rjulia/libs/rjulia.so
7f87fd767000-7f87fd768000 r--p 00006000 08:01 2099261 /home/armgong/R/x86_64-unknown-linux-gnu-library/3.1/rjulia/libs/rjulia.so
7f87fd768000-7f87fd769000 rw-p 00007000 08:01 2099261 /home/armgong/R/x86_64-unknown-linux-gnu-library/3.1/rjulia/libs/rjulia.so
7f87fd769000-7f87fd8d2000 r-xp 00000000 08:01 12586451 /usr/lib/libstdc++.so.6.0.20
7f87fd8d2000-7f87fdad1000 ---p 00169000 08:01 12586451 /usr/lib/libstdc++.so.6.0.20
7f87fdad1000-7f87fdadb000 r--p 00168000 08:01 12586451 /usr/lib/libstdc++.so.6.0.20
7f87fdadb000-7f87fdadd000 rw-p 00172000 08:01 12586451 /usr/lib/libstdc++.so.6.0.20
7f87fdadd000-7f87fdae1000 rw-p 00000000 00:00 0
7f87fdae1000-7f87fdaf6000 r-xp 00000000 08:01 12589184 /usr/lib/libz.so.1.2.8
7f87fdaf6000-7f87fdcf5000 ---p 00015000 08:01 12589184 /usr/lib/libz.so.1.2.8
7f87fdcf5000-7f87fdcf6000 r--p 00014000 08:01 12589184 /usr/lib/libz.so.1.2.8
7f87fdcf6000-7f87fdcf7000 rw-p 00015000 08:01 12589184 /usr/lib/libz.so.1.2.8
7f87fdcf7000-7f87fe97c000 r-xp 00000000 08:11 1052962 /data/julia0.3/0.33/lib/libjulia.so
7f87fe97c000-7f87feb7b000 ---p 00c85000 08:11 1052962 /data/julia0.3/0.33/lib/libjulia.so
7f87feb7b000-7f87fec53000 rw-p 00c84000 08:11 1052962 /data/julia0.3/0.33/lib/libjulia.so
7f87fec53000-7f87fed2d000 rw-p 00000000 00:00 0
7f87fed2d000-7f87ff2c5000 r-xp 00000000 08:01 12601905 /usr/lib/liblapack.so
7f87ff2c5000-7f87ff4c4000 ---p 00598000 08:01 12601905 /usr/lib/liblapack.so
7f87ff4c4000-7f87ff4c5000 r--p 00597000 08:01 12601905 /usr/lib/liblapack.so
7f87ff4c5000-7f87ff4c8000 rw-p 00598000 08:01 12601905 /usr/lib/liblapack.so
7f87ff4c8000-7f87ff5d6000 rw-p 00000000 00:00 0
7f87ff5d6000-7f87ff676000 r-xp 00000000 08:01 13240661 /usr/lib/R/library/stats/libs/stats.so
7f87ff676000-7f87ff875000 ---p 000a0000 08:01 13240661 /usr/lib/R/library/stats/libs/stats.so
7f87ff875000-7f87ff877000 r--p 0009f000 08:01 13240661 /usr/lib/R/library/stats/libs/stats.so
7f87ff877000-7f87ff879000 rw-p 000a1000 08:01 13240661 /usr/lib/R/library/stats/libs/stats.so
7f87ff879000-7f87ff8b7000 r-xp 00000000 08:01 13239404 /usr/lib/R/library/grDevices/libs/grDevices.so
7f87ff8b7000-7f87ffab6000 ---p 0003e000 08:01 13239404 /usr/lib/R/library/grDevices/libs/grDevices.so
7f87ffab6000-7f87ffabb000 r--p 0003d000 08:01 13239404 /usr/lib/R/library/grDevices/libs/grDevices.so
7f87ffabb000-7f87ffabd000 rw-p 00042000 08:01 13239404 /usr/lib/R/library/grDevices/libs/grDevices.so
7f87ffabd000-7f87ffabe000 rw-p 00000000 00:00 0
7f87ffaf0000-7f87ffb2e000 r-xp 00000000 08:01 13239270 /usr/lib/R/library/graphics/libs/graphics.so
7f87ffb2e000-7f87ffd2e000 ---p 0003e000 08:01 13239270 /usr/lib/R/library/graphics/libs/graphics.so
7f87ffd2e000-7f87ffd2f000 r--p 0003e000 08:01 13239270 /usr/lib/R/library/graphics/libs/graphics.so
7f87ffd2f000-7f87ffd30000 rw-p 0003f000 08:01 13239270 /usr/lib/R/library/graphics/libs/graphics.so
7f87ffd30000-7f87ffd32000 r-xp 00000000 08:01 12586283 /usr/lib/gconv/ISO8859-1.so
7f87ffd32000-7f87fff31000 ---p 00002000 08:01 12586283 /usr/lib/gconv/ISO8859-1.so
7f87fff31000-7f87fff32000 r--p 00001000 08:01 12586283 /usr/lib/gconv/ISO8859-1.so
7f87fff32000-7f87fff33000 rw-p 00002000 08:01 12586283 /usr/lib/gconv/ISO8859-1.so
7f87fff33000-7f880003a000 rw-p 00000000 00:00 0
7f880003a000-7f8800042000 r-xp 00000000 08:01 13239779 /usr/lib/R/library/methods/libs/methods.so
7f8800042000-7f8800241000 ---p 00008000 08:01 13239779 /usr/lib/R/library/methods/libs/methods.so
7f8800241000-7f8800242000 r--p 00007000 08:01 13239779 /usr/lib/R/library/methods/libs/methods.so
7f8800242000-7f8800243000 rw-p 00008000 08:01 13239779 /usr/lib/R/library/methods/libs/methods.so
7f8800243000-7f880024d000 r-xp 00000000 08:01 13240445 /usr/lib/R/library/utils/libs/utils.so
7f880024d000-7f880044d000 ---p 0000a000 08:01 13240445 /usr/lib/R/library/utils/libs/utils.so
7f880044d000-7f880044e000 r--p 0000a000 08:01 13240445 /usr/lib/R/library/utils/libs/utils.so
7f880044e000-7f880044f000 rw-p 0000b000 08:01 13240445 /usr/lib/R/library/utils/libs/utils.so
7f880044f000-7f88004f7000 rw-p 00000000 00:00 0
7f88004f7000-7f8800502000 r-xp 00000000 08:01 12586092 /usr/lib/libnss_files-2.20.so
7f8800502000-7f8800702000 ---p 0000b000 08:01 12586092 /usr/lib/libnss_files-2.20.so
7f8800702000-7f8800703000 r--p 0000b000 08:01 12586092 /usr/lib/libnss_files-2.20.so
7f8800703000-7f8800704000 rw-p 0000c000 08:01 12586092 /usr/lib/libnss_files-2.20.so
7f8800704000-7f88007b5000 rw-p 00000000 00:00 0
7f88007b5000-7f8800ae1000 r--p 00000000 08:01 12601662 /usr/lib/locale/locale-archive
7f8800ae1000-7f8800af7000 r-xp 00000000 08:01 12586447 /usr/lib/libgcc_s.so.1
7f8800af7000-7f8800cf6000 ---p 00016000 08:01 12586447 /usr/lib/libgcc_s.so.1
7f8800cf6000-7f8800cf7000 rw-p 00015000 08:01 12586447 /usr/lib/libgcc_s.so.1
7f8800cf7000-7f8800d34000 r-xp 00000000 08:01 12586464 /usr/lib/libquadmath.so.0.0.0
7f8800d34000-7f8800f33000 ---p 0003d000 08:01 12586464 /usr/lib/libquadmath.so.0.0.0
7f8800f33000-7f8800f34000 rw-p 0003c000 08:01 12586464 /usr/lib/libquadmath.so.0.0.0
7f8800f34000-7f8800f93000 r-xp 00000000 08:01 12586503 /usr/lib/libncursesw.so.5.9
7f8800f93000-7f8801193000 ---p 0005f000 08:01 12586503 /usr/lib/libncursesw.so.5.9
7f8801193000-7f8801197000 r--p 0005f000 08:01 12586503 /usr/lib/libncursesw.so.5.9
7f8801197000-7f8801199000 rw-p 00063000 08:01 12586503 /usr/lib/libncursesw.so.5.9
7f8801199000-7f88012be000 r-xp 00000000 08:01 12586457 /usr/lib/libgfortran.so.3.0.0
7f88012be000-7f88014be000 ---p 00125000 08:01 12586457 /usr/lib/libgfortran.so.3.0.0
7f88014be000-7f88014c0000 rw-p 00125000 08:01 12586457 /usr/lib/libgfortran.so.3.0.0
7f88014c0000-7f88014d6000 r-xp 00000000 08:01 12586439 /usr/lib/libgomp.so.1.0.0
7f88014d6000-7f88016d5000 ---p 00016000 08:01 12586439 /usr/lib/libgomp.so.1.0.0
7f88016d5000-7f88016d6000 rw-p 00015000 08:01 12586439 /usr/lib/libgomp.so.1.0.0
7f88016d6000-7f88016d9000 r-xp 00000000 08:01 12586089 /usr/lib/libdl-2.20.so
7f88016d9000-7f88018d8000 ---p 00003000 08:01 12586089 /usr/lib/libdl-2.20.so
7f88018d8000-7f88018d9000 r--p 00002000 08:01 12586089 /usr/lib/libdl-2.20.so
7f88018d9000-7f88018da000 rw-p 00003000 08:01 12586089 /usr/lib/libdl-2.20.so
7f88018da000-7f88018e1000 r-xp 00000000 08:01 12586133 /usr/lib/librt-2.20.so
7f88018e1000-7f8801ae0000 ---p 00007000 08:01 12586133 /usr/lib/librt-2.20.so
7f8801ae0000-7f8801ae1000 r--p 00006000 08:01 12586133 /usr/lib/librt-2.20.so
7f8801ae1000-7f8801ae2000 rw-p 00007000 08:01 12586133 /usr/lib/librt-2.20.so
7f8801ae2000-7f8801b07000 r-xp 00000000 08:01 12593683 /usr/lib/liblzma.so.5.2.0
7f8801b07000-7f8801d06000 ---p 00025000 08:01 12593683 /usr/lib/liblzma.so.5.2.0
7f8801d06000-7f8801d07000 r--p 00024000 08:01 12593683 /usr/lib/liblzma.so.5.2.0
7f8801d07000-7f8801d08000 rw-p 00025000 08:01 12593683 /usr/lib/liblzma.so.5.2.0
7f8801d08000-7f8801d49000 r-xp 00000000 08:01 12589082 /usr/lib/libreadline.so.6.3
7f8801d49000-7f8801f49000 ---p 00041000 08:01 12589082 /usr/lib/libreadline.so.6.3
7f8801f49000-7f8801f4b000 r--p 00041000 08:01 12589082 /usr/lib/libreadline.so.6.3
7f8801f4b000-7f8801f52000 rw-p 00043000 08:01 12589082 /usr/lib/libreadline.so.6.3
7f8801f52000-7f8801f53000 rw-p 00000000 00:00 0
7f8801f53000-7f8802056000 r-xp 00000000 08:01 12586119 /usr/lib/libm-2.20.so
7f8802056000-7f8802256000 ---p 00103000 08:01 12586119 /usr/lib/libm-2.20.so
7f8802256000-7f8802257000 r--p 00103000 08:01 12586119 /usr/lib/libm-2.20.so
7f8802257000-7f8802258000 rw-p 00104000 08:01 12586119 /usr/lib/libm-2.20.so
7f8802258000-7f88022af000 r-xp 00000000 08:01 12601874 /usr/lib/libblas.so
7f88022af000-7f88024ae000 ---p 00057000 08:01 12601874 /usr/lib/libblas.so
7f88024ae000-7f88024af000 r--p 00056000 08:01 12601874 /usr/lib/libblas.so
7f88024af000-7f88024b0000 rw-p 00057000 08:01 12601874 /usr/lib/libblas.so
7f88024b0000-7f8802649000 r-xp 00000000 08:01 12586120 /usr/lib/libc-2.20.so
7f8802649000-7f8802849000 ---p 00199000 08:01 12586120 /usr/lib/libc-2.20.so
7f8802849000-7f880284d000 r--p 00199000 08:01 12586120 /usr/lib/libc-2.20.so
7f880284d000-7f880284f000 rw-p 0019d000 08:01 12586120 /usr/lib/libc-2.20.so
7f880284f000-7f8802853000 rw-p 00000000 00:00 0
7f8802853000-7f880286a000 r-xp 00000000 08:01 12586098 /usr/lib/libpthread-2.20.so
7f880286a000-7f8802a69000 ---p 00017000 08:01 12586098 /usr/lib/libpthread-2.20.so
7f8802a69000-7f8802a6a000 r--p 00016000 08:01 12586098 /usr/lib/libpthread-2.20.so
7f8802a6a000-7f8802a6b000 rw-p 00017000 08:01 12586098 /usr/lib/libpthread-2.20.so
7f8802a6b000-7f8802a6f000 rw-p 00000000 00:00 0
7f8802a6f000-7f8802d15000 r-xp 00000000 08:01 13240745 /usr/lib/R/lib/libR.so
7f8802d15000-7f8802f15000 ---p 002a6000 08:01 13240745 /usr/lib/R/lib/libR.so
7f8802f15000-7f8802f1a000 r--p 002a6000 08:01 13240745 /usr/lib/R/lib/libR.so
7f8802f1a000-7f8802f26000 rw-p 002ab000 08:01 13240745 /usr/lib/R/lib/libR.so
7f8802f26000-7f8803012000 rw-p 00000000 00:00 0
7f8803012000-7f8803034000 r-xp 00000000 08:01 12586095 /usr/lib/ld-2.20.so
7f880304c000-7f880306c000 rwxp 00000000 00:00 0
7f880306c000-7f880321e000 rw-p 00000000 00:00 0
7f880321e000-7f8803220000 rw-p 00000000 00:00 0
7f8803220000-7f8803230000 rwxp 00000000 00:00 0
7f8803230000-7f8803231000 r--p 00000000 08:01 13239951 /usr/lib/R/library/translations/en/LC_MESSAGES/R.mo
7f8803231000-7f8803233000 rw-p 00000000 00:00 0
7f8803233000-7f8803234000 r--p 00021000 08:01 12586095 /usr/lib/ld-2.20.so
7f8803234000-7f8803235000 rw-p 00022000 08:01 12586095 /usr/lib/ld-2.20.so
7f8803235000-7f8803236000 rw-p 00000000 00:00 0
7fff60d02000-7fff60d2c000 rw-p 00000000 00:00 0 [stack]
7fff60d5e000-7fff60d60000 r--p 00000000 00:00 0 [vvar]
7fff60d60000-7fff60d62000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

signal (6): Aborted
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 38941363)
__fortify_fail at /usr/lib/libc.so.6 (unknown line)
__fortify_fail at /usr/lib/libc.so.6 (unknown line)
Rf_applyClosure at /usr/lib64/R/lib/libR.so (unknown line)
Rf_eval at /usr/lib64/R/lib/libR.so (unknown line)
unknown function (ip: 45354448)
Rf_eval at /usr/lib64/R/lib/libR.so (unknown line)
unknown function (ip: 45364445)
Rf_eval at /usr/lib64/R/lib/libR.so (unknown line)
unknown function (ip: 45354448)
Rf_eval at /usr/lib64/R/lib/libR.so (unknown line)
unknown function (ip: 45364445)
Rf_eval at /usr/lib64/R/lib/libR.so (unknown line)
Rf_ReplIteration at /usr/lib64/R/lib/libR.so (unknown line)
unknown function (ip: 45513521)
run_Rmainloop at /usr/lib64/R/lib/libR.so (unknown line)
main at /usr/lib64/R/bin/exec/R (unknown line)
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 4196443)
unknown function (ip: 0)
Aborted (core dumped)

@armgong
Copy link
Contributor Author

armgong commented Feb 7, 2015

ok, 0.4 problem not related to julia embedded, in julia 0.4 REPL run

for i=1:10000 r=remotecall(2,rand,2,2);fetch(r);r=remotecall(3,rand,2,2);fetch(r) end

also lead same issue , julia worker dead but julia head still alive
julia> versioninfo()
Julia Version 0.4.0-dev+3172
Commit 456b85a (2015-02-06 21:24 UTC)
Platform Info:
System: Linux (x86_64-unknown-linux-gnu)
CPU: Intel(R) Celeron(R) CPU J1900 @ 1.99GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT NO_AFFINITY ATOM)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3

julia> for i=1:10000 r=remotecall(2,rand,2,2);fetch(r);r=remotecall(3,rand,2,2);fetch(r) end

signal (11): Segmentation fault
unknown function (ip: -259180096)
unknown function (ip: -259172242)
unknown function (ip: -259165232)
jl_gc_collect at /data/julia/usr/bin/../lib/libjulia.so (unknown line)
unknown function (ip: -259154441)
jl_alloc_tuple_uninit at /data/julia/usr/bin/../lib/libjulia.so (unknown line)
jl_f_tuple at /data/julia/usr/bin/../lib/libjulia.so (unknown line)
jl_f_apply at /data/julia/usr/bin/../lib/libjulia.so (unknown line)
ntuple at ./tuple.jl:30
ntuple at ./tuple.jl:30
ntuple at ./tuple.jl:30
ntuple at ./tuple.jl:30
lot of it .....

deserialize_tuple at serialize.jl:355
handle_deserialize at serialize.jl:350
anonymous at task.jl:855
unknown function (ip: -259253919)
unknown function (ip: 0)
Worker 3 terminated.ERROR: ProcessExitedException()
in wait at ./task.jl:288
in wait at ./task.jl:198
in wait_full at ./multi.jl:571
in remotecall_fetch at multi.jl:671
in call_on_owner at ./multi.jl:716
in fetch at multi.jl:726
in anonymous at no file:1
julia>

and julia 0.35 run this code just ok

@armgong
Copy link
Contributor Author

armgong commented Feb 7, 2015

for i=1:10000 r=remotecall(2,rand,2,2);fetch(r);r=remotecall(3,rand,2,2);fetch(r) end also fail on julia REPL windows x86_64 0.4 master branch with message

F:>julia\usr\bin\julia -p 2
_
_ _ ()_ | A fresh approach to technical computing
() | () () | Documentation: http://docs.julialang.org
_ _ | | __ _ | Type "help()" for help.
| | | | | | |/ ` | |
| | |
| | | | (
| | | Version 0.4.0-dev+3174 (2015-02-07 05:30 UTC)
/ |_'|||__'| | Commit 49a1f2e* (0 days old master)
|__/ | x86_64-w64-mingw32

julia> for i=1:10000 r=remotecall(2,rand,2,2);fetch(r);r=remotecall(3,rand,2,2);
julia> for i=1:10000 r=remotecall(2,rand,2,2);fetch(r);r=remotecall(3,rand,2,2);
fetch(r) end
Worker 3 terminated.ERROR: InexactError()
in _uv_hook_return_spawn at process.jl:229 (repeats 2 times)

vtjnash added a commit that referenced this issue Feb 8, 2015
apparently some compilers don't like my previous attempts to generate conditional alloca gcframes
vtjnash added a commit that referenced this issue Feb 8, 2015
apparently some compilers don't like my previous attempts to generate conditional alloca gcframes
@vtjnash vtjnash closed this as completed Feb 8, 2015
@tkelman tkelman added this to the 0.3.7 milestone Feb 18, 2015
@tkelman
Copy link
Contributor

tkelman commented Mar 11, 2015

Just for reference - see the comments on f769e41, I don't plan on backporting the fix for this myself because it has messy conflicts that I don't know how to resolve. If anyone would like to see this resolved for 0.3.7 or later, please prepare a PR against release-0.3.

@tkelman tkelman removed this from the 0.3.7 milestone Jul 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelism Parallel or distributed computation regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

5 participants