A large number of `@spawnat` calls hangs #15923

amitmurthy · 2016-04-18T15:10:22Z

Recent changes have rendered workaround #14456 moot.

The below test (run under JULIA_TESTFULL=1) hangs

for n = 1:10^6
      fetch(@spawnat myid() myid())
end

A Ctrl-C prints

^CERROR: InterruptException:
 in &(::Int64, ::Int64) at ./int.jl:131
 in rehash!(::Dict{Any,Any}, ::Int64) at ./dict.jl:551
 in rehash!(::Dict{Any,Any}, ::Int64) at ./dict.jl:562 (repeats 2 times)
 in _setindex!(::Dict{Any,Any}, ::Bool, ::WeakRef, ::Int64) at ./dict.jl:691
 in setindex!(::Dict{Any,Any}, ::Bool, ::WeakRef) at ./dict.jl:708
 in setindex!(::WeakKeyDict{Any,Any}, ::Bool, ::Future) at ./dict.jl:876
 in test_existing_ref(::Future) at ./multi.jl:509
 in Future(::Int64, ::Int64, ::Int64) at ./multi.jl:477
 in Future(::Int64) at ./multi.jl:545
 in remotecall(::Any, ::LocalProcess) at ./multi.jl:799
 in remotecall(::Any, ::Int64) at ./multi.jl:811
 [inlined code] from ./REPL[1]:2
 in anonymous at ./<no file>:4294967295
 in eval(::Module, ::Any) at ./boot.jl:236

The underlying cause is probably related to #14445 (comment)

The text was updated successfully, but these errors were encountered:

amitmurthy · 2016-04-19T04:26:28Z

It is an issue with Dicts most probably due to a memory corruption.

The looping is here -

julia/base/dict.jl

Lines 550 to 552 in fb81034

    
           while slots[index] != 0 
        
               index = (index & (newsz-1)) + 1 
        
           end

I tested with the below modification of that block.

            lc = 0
            while slots[index] != 0
                index = (index & (newsz-1)) + 1
                lc = lc+1
                if lc > 10000
                    @assert lc < 10000
                elseif lc > 1000
                    println("Looping...index[", index, "], newsz[", newsz, "]")
                end
            end

and it loops around the following values:

Looping...index[4499], newsz[65536]
Looping...index[4500], newsz[65536]
Looping...index[4501], newsz[65536]
Looping...index[3773], newsz[65536]  <---- This shouldn't happen
Looping...index[3774], newsz[65536]
Looping...index[3775], newsz[65536]

which leads me to suspect a corruption somewhere.

cc: @yuyichao

amitmurthy · 2016-04-20T05:46:07Z

Disabling gc allows for the above test to run fine. The odd thing, however, is that re-enabling gc() does not release memory allocated previously during the run.

amitmurthy · 2016-04-29T03:49:46Z

This is a regression and is due to an interplay between the Dict code and GC which I don't fully understand. Tagging as 0.5.

vtjnash · 2016-05-04T20:18:05Z

seems like it's a performance regression? adding a typeassert in hashindex seems to fix the performance bottleneck in rehash!, although calling isequal on such a large dictionary of WeakRef objects is remains a huge performance bottleneck:

diff --git a/base/dict.jl b/base/dict.jl
index 4ac15ae..d9bf6b1 100644
--- a/base/dict.jl
+++ b/base/dict.jl
@@ -9,15 +9,7 @@ haskey(d::Associative, k) = in(k,keys(d))
 function in(p::Pair, a::Associative, valcmp=(==))
     v = get(a,p[1],secret_table_token)
     if !is(v, secret_table_token)
-        if valcmp === is
-            is(v, p[2]) && return true
-        elseif valcmp === (==)
-            ==(v, p[2]) && return true
-        elseif valcmp === isequal
-            isequal(v, p[2]) && return true
-        else
-            valcmp(v, p[2]) && return true
-        end
+        valcmp(v, p[2]) && return true
     end
     return false
 end
@@ -515,7 +507,7 @@ function convert{K,V}(::Type{Dict{K,V}},d::Associative)
 end
 convert{K,V}(::Type{Dict{K,V}},d::Dict{K,V}) = d

-hashindex(key, sz) = ((hash(key)%Int) & (sz-1)) + 1
+hashindex(key, sz) = (((hash(key)%Int) & (sz-1)) + 1)::Int

 isslotempty(h::Dict, i::Int) = h.slots[i] == 0x0
 isslotfilled(h::Dict, i::Int) = h.slots[i] == 0x1
@@ -616,7 +608,7 @@ function ht_keyindex{K,V}(h::Dict{K,V}, key)
         if isslotempty(h,index)
             break
         end
-        if !isslotmissing(h,index) && isequal(key,keys[index])
+        if !isslotmissing(h,index) && (key === keys[index] || isequal(key,keys[index]))
             return index
         end

@@ -650,7 +642,7 @@ function ht_keyindex2{K,V}(h::Dict{K,V}, key)
                 # in case "key" already exists in a later collided slot.
                 avail = -index
             end
-        elseif isequal(key, keys[index])
+        elseif key === keys[index] || isequal(key, keys[index])
             return index
         end

@@ -957,15 +949,7 @@ function in(key_value::Pair, dict::ImmutableDict, valcmp=(==))
     key, value = key_value
     while isdefined(dict, :parent)
         if dict.key == key
-            if valcmp === is
-                is(value, dict.value) && return true
-            elseif valcmp === (==)
-                ==(value, dict.value) && return true
-            elseif valcmp === isequal
-                isequal(value, dict.value) && return true
-            else
-                valcmp(value, dict.value) && return true
-            end
+            valcmp(value, dict.value) && return true
         end
         dict = dict.parent
     end

I think this may have been due to changes to the multi.jl code? After finishing, we see that the dictionary was 256x larger:

v0.4
julia> Base.client_refs|>length
12139
julia> Base.client_refs.ht.keys|>length
65536

master
julia> Base.client_refs|>length
829344
julia> Base.client_refs.ht.keys|>length
16777216

furthermore, upon calling gc(), on v0.4, client_refs drops to 0 after a very brief pause. on master, client_refs drops only slightly after a very long pause (further calls to gc have no effect).

ref comment in #15923

this made it unreliable for the WeakKeyDict these are typically put in (client_refs) to have trouble finding them to cleanup the dictionary later since their hash and identity changed fixes #15923 reverts workaround #14456 (doesn't break #14295 due to previous commit) may also fix #16091

amitmurthy · 2016-05-05T01:42:46Z

I think this may have been due to changes to the multi.jl code? After finishing, we see that the dictionary was 256x larger:

The number of entries in Base.client_refs should not be different in master vs 0.4 . I think it is differences in gc behavior between master and 0.4 resulting in a slower collection of objects.

vtjnash · 2016-05-05T02:14:06Z

The number of entries in Base.client_refs should not be different in master vs 0.4

yes, agreed – both should be 0

I think it is differences in gc behavior between master and 0.4

it appears that the order of finalizers is less predictable on master than it was on v0.4, yes

resulting in a slower collection of objects.

not necessarily – I think master may actually be faster at collecting them, the bug was in the finalizer (see 8d1970e)

amitmurthy · 2016-05-09T02:35:30Z

Thanks!

ref comment in #15923

ref comment in #15923 (cherry picked from commit 32b58e1) ref #16204

ref comment in JuliaLang#15923

amitmurthy mentioned this issue Apr 19, 2016

UndefRefError with Dicts and finalizers #14445

Closed

amitmurthy added the regression Regression in behavior compared to a previous version label Apr 29, 2016

amitmurthy added this to the 0.5.0 milestone Apr 29, 2016

amitmurthy mentioned this issue Apr 29, 2016

SharedArray access in @parallel for causes Segmentation fault #14764

Closed

vtjnash mentioned this issue May 4, 2016

DictChannel not defined error #16091

Closed

vtjnash added the performance Must go faster label May 4, 2016

vtjnash added a commit that referenced this issue May 4, 2016

improve efficiency of Dict and accuracy of age (nee dirty) flag

9b5fa5d

ref comment in #15923

vtjnash closed this as completed in 7348cfe May 9, 2016

amitmurthy mentioned this issue May 9, 2016

Revert extra tests for DictChannel failure #16266

Merged

vtjnash added a commit that referenced this issue Jul 26, 2016

improve efficiency of Dict and accuracy of age (nee dirty) flag

9fd5eef

ref comment in #15923

vtjnash added a commit that referenced this issue Jul 26, 2016

improve efficiency of Dict and accuracy of age (nee dirty) flag

24a3c51

ref comment in #15923

vtjnash added a commit that referenced this issue Aug 4, 2016

improve efficiency of Dict and accuracy of age (nee dirty) flag

0d72f9e

ref comment in #15923

vtjnash added a commit that referenced this issue Aug 5, 2016

improve efficiency of Dict and accuracy of age (nee dirty) flag

32b58e1

ref comment in #15923

tkelman pushed a commit that referenced this issue Aug 11, 2016

improve efficiency of Dict and accuracy of age (nee dirty) flag

3d9c3d1

ref comment in #15923 (cherry picked from commit 32b58e1) ref #16204

mfasi pushed a commit to mfasi/julia that referenced this issue Sep 5, 2016

improve efficiency of Dict and accuracy of age (nee dirty) flag

98d27ba

ref comment in JuliaLang#15923

andyferris mentioned this issue Sep 23, 2017

Removed redundant code from get! for Dict #23841

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A large number of `@spawnat` calls hangs #15923

A large number of `@spawnat` calls hangs #15923

amitmurthy commented Apr 18, 2016

amitmurthy commented Apr 19, 2016

amitmurthy commented Apr 20, 2016

amitmurthy commented Apr 29, 2016

vtjnash commented May 4, 2016

amitmurthy commented May 5, 2016

vtjnash commented May 5, 2016

amitmurthy commented May 9, 2016

A large number of @spawnat calls hangs #15923

A large number of @spawnat calls hangs #15923

Comments

amitmurthy commented Apr 18, 2016

amitmurthy commented Apr 19, 2016

amitmurthy commented Apr 20, 2016

amitmurthy commented Apr 29, 2016

vtjnash commented May 4, 2016

amitmurthy commented May 5, 2016

vtjnash commented May 5, 2016

amitmurthy commented May 9, 2016

A large number of `@spawnat` calls hangs #15923

A large number of `@spawnat` calls hangs #15923