Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large memory leak when using threads #31923

Closed
robsmith11 opened this issue May 4, 2019 · 27 comments · Fixed by #32217
Closed

Large memory leak when using threads #31923

robsmith11 opened this issue May 4, 2019 · 27 comments · Fixed by #32217
Labels
multithreading Base.Threads and related functionality

Comments

@robsmith11
Copy link
Contributor

With the following minimal example, I am seeing memory usage reach 10GB+ within seconds if I start Julia with JULIA_NUM_THREADS=4 (no other options).

If I only use single thread, memory usage remains around 200MB.

I've tested on both the 1.1 release and yesterday's nightly on a Linux Skylake Xeon.

Threads.@threads for i in 1:100000
    sum(collect(1:10^6))
end
@ViralBShah ViralBShah added the multithreading Base.Threads and related functionality label May 4, 2019
@ViralBShah ViralBShah modified the milestone: 1.0.x May 4, 2019
@ChrisRackauckas
Copy link
Member

Did you check if this is actually a leak by calling GC? A memory leak would mean that it doesn't get freed, not that the behavior of the GC might want to keep it around for a bit longer for performance reasons. Here, since the GC is not multithreaded, I would assume that the GC tries to not run for as long as possible when threaded, but that doesn't mean that your memory will fill up and Julia will crash. Instead it'll wait quite a bit until it either has to GC or it can GC after the multithreading, which is the behavior you would want for performance.

@chriselrod
Copy link
Contributor

I tried:

julia> versioninfo()
Julia Version 1.3.0-DEV.163
Commit c40d9b099c (2019-05-04 03:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Ryzen Threadripper 1950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.0 (ORCJIT, znver1)
Environment:
  JULIA_NUM_THREADS = 16

julia> function foo(N)
           Threads.@threads for i in 1:N
               sum(collect(1:10^6))
           end
       end
foo (generic function with 1 method)

julia> @time foo(10^5)
 46.325609 seconds (337.58 k allocations: 736.214 GiB, 15.02% gc time)

julia> @time foo(10^5)
 42.238979 seconds (199.09 k allocations: 738.308 GiB, 9.36% gc time)

julia> GC.gc()

julia> @time foo(10^5)
 42.090349 seconds (199.11 k allocations: 738.472 GiB, 8.85% gc time)

So there seems to be a lot of GC activity. After the first run it used about 25G of RAM. After the last run, it was using over 32G.

@hycakir
Copy link
Contributor

hycakir commented May 5, 2019

The issue is similar to the one in this SO question asked a while back. You do not need sum or anything else than allocations to create the leak.

function f(n) 
   Threads.@threads for i=1:n
        zeros(n)
   end
end

Try with large values like f(10^5) to see the issue. Forcing GC helps maybe a little.

@jpsamaroo
Copy link
Member

For me, adding a call to GC.gc() within the @threads call at the end seems to keep memory usage in check, although things slow to a crawl. This is on commit 4dc15938bbb1b5f9fda9def3a85e80e3357a8193.

@jpsamaroo
Copy link
Member

jpsamaroo commented May 5, 2019

Maybe this is nothing, but when I run julia-debug under GDB, removing the call at the end to GC.gc() will cause sporadic segfaults on this line:

ccall(:jl_gc_safepoint, Cvoid, ())

I'm not on master, though, so I'll start a new build to see if this behaviour still occurs.

This is normal behavior (and also occurs on master).

@yuyichao
Copy link
Contributor

yuyichao commented May 5, 2019

The segfault is normal. You should ignore it.

@jpsamaroo
Copy link
Member

I'm going to take a wild (uneducated) guess and say that somehow, the arrays getting allocated sometimes don't get rooted in the GC, and so effectively become lost (which is why an explicit GC.gc() after the @threads loop finished fails to collect the allocated arrays). To confirm this, I'm working on getting Valgrind to cooperate with me for this test case, but having some slight technical difficulties...

Assuming the above theory is plausible, is there any documentation on how I could go about recording GC statistics over time, e.g. to confirm that every allocation is later deallocated?

@yuyichao
Copy link
Contributor

yuyichao commented May 5, 2019

don't get rooted in the GC

No. Rooting does exactly the opposite.

@chethega
Copy link
Contributor

chethega commented May 5, 2019

Regarding whether the memory actually leaks: You need to test whether julia's memory use grows without bounds or reaches a steady state. With the default config, julia / linux / glibc is very bad at returning memory to the OS, probably due to heap fragmentation. However, the memory is still there and not leaked (glibc knows that the memory is free and will hand it out on malloc).

Has anyone tried reproducing on windows or osx?

@JeffBezanson
Copy link
Sponsor Member

Instead it'll wait quite a bit until it either has to GC or it can GC after the multithreading

This is incorrect; the GC does not wait for the end of @threads to run. It can run during the loop and is fully multi-threaded. The only capability we're missing is running julia code concurrently with GC, which is a very tall order. We can run non-julia (e.g. C) code concurrently with GC, and any julia code should at least handle threaded GC correctly.

@JeffBezanson
Copy link
Sponsor Member

I tried running #31923 (comment) on the 1.1 release binary and master and with 4 threads I see memory use holding steady at no more than about 3GB. On the second and third @time runs it goes up to 4-5GB but seems to eventually reach a steady state of about 4.3GB. Strange.

@dominikkiese
Copy link

dominikkiese commented May 11, 2019

Is there any progress on this subject? I'm running into similar issues with a simulation of mine. @code_warntype doesn't show any type instabilities but memory consumption fluctuates a lot after a few iterations even on current 1. 1 release.

@Roger-luo
Copy link
Contributor

Roger-luo commented May 25, 2019

I didn't manage to reduce the code, but I think it is related to this issue. When I run the benchmark for YaoArrayRegister defined here (with PkgBenchmark.jl) :

https://github.com/QuantumBFS/YaoArrayRegister.jl/blob/master/benchmark/benchmarks.jl

with the following configuration

BenchmarkConfig(
        id="origin/multithreading",
        env = Dict("JULIA_NUM_THREADS"=>4),
        juliacmd=`julia -O3`
    )

I hit memory leak (but was fine on Mac OS), but if I run this with single thread (JULIA_NUM_THREADS"=>1) it works fine on Linux. My versioninfo is

julia> versioninfo()
Julia Version 1.1.1
Commit 55e36cc308 (2019-05-16 04:10 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

I tried this on both julia 1.1.1, and current master (e39f498aca58abb8aecac34d329d7de9a0cead02), got this issue both.

You should be able to reproduce this by running the following:

] dev YaoBase YaoArrayRegister BitBasis
] add StaticArrays LuxurySparse PkgBenchmark BenchmarkTools

and

using PkgBenchmark

benchmarkpkg("YaoArrayRegister", BenchmarkConfig(
        id="origin/multithreading",
        env = Dict("JULIA_NUM_THREADS"=>4),
        juliacmd=`julia -O3`
    )
"""

I'll try if I could find a reduced version, and post here later.

@Roger-luo
Copy link
Contributor

I tried #31923 (comment) as well, got memory keep increasing on this machine as well.

@JeffBezanson
Copy link
Sponsor Member

For those on linux experiencing this, try ccalling malloc_trim in the loop and see if it helps.

@robsmith11
Copy link
Contributor Author

Doesn't seem to have an effect for me. Julia was using 26GB of RAM after running:

julia> Threads.@threads for _ in 1:10^4
            collect(1:10^6)
            ccall(:malloc_trim, Cvoid, (Cint,), 0)
        end

@JeffBezanson
Copy link
Sponsor Member

How much RAM does your system have?

@robsmith11
Copy link
Contributor Author

My system has 64GB of RAM.

Every time I run the loop (without restarting Julia), it will end up with a different (seemingly random) amount of RAM used, so it does appear to be freeing memory sometimes. With this example at least, Julia's resident memory never exceeds ~32GB, so my system only starts swapping if other processes are using at least ~32GB.

@dominikkiese
Copy link

dominikkiese commented Jun 1, 2019

For me this seems to keep the memory leak in check (with some random RAM fluctuation still present). I notice however, that robsmith11's example seems to be running really slow

@btime Threads.@threads for _ in 1:10^4 collect(1:10^6) ccall(:malloc_trim, Cvoid, (Cint,), 0) end

yields

72.530 s (19982 allocations: 74.37 GiB)

whereas

@btime @sync @distributed for _ in 1:10^4 collect(1:10^6) end

yields

5.849 s (640 allocations: 25.53 KiB)

The test were run for JULIA_NUM_THREADS=4 and nworkers()=4 on a 4 core linux machine.

JeffBezanson added a commit that referenced this issue Jun 1, 2019
JeffBezanson added a commit that referenced this issue Jun 1, 2019
@JeffBezanson
Copy link
Sponsor Member

Good news! Please try #32217.

@robsmith11
Copy link
Contributor Author

robsmith11 commented Jun 1, 2019

I've just tried your branch, but unfortunately I don't see a big difference. After a few seconds of running my example, I see some spikes up to 46GB of memory used..

EDIT: Sorry, I just realized I cloned the wrong branch.. let me retest

EDIT2: Awesome! Now that I'm actually using your branch, memory stays under 512MB. Looks like that fixed it. :)

@dominikkiese
Copy link

Also works for me, both the above example as well as in my actual code. Many thanks!

@JeffBezanson
Copy link
Sponsor Member

Great, that confirms this is a duplicate of #27173.

@StefanKarpinski
Copy link
Sponsor Member

@JeffBezanson, did you mean to link to #27173? That issue is still open and is about a data race not a memory leak.

@JeffBezanson
Copy link
Sponsor Member

It's not actually a memory leak; it's growing memory use due to incorrect updating of the GC counters.

@StefanKarpinski
Copy link
Sponsor Member

Perhaps I'm a bit slow on the uptake today, but how is this fixed but #27173 isn't?

@JeffBezanson
Copy link
Sponsor Member

It's not fixed yet; I closed this as a duplicate of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.