Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak #83

Closed
bodono opened this issue Jun 4, 2015 · 18 comments
Closed

Memory leak #83

bodono opened this issue Jun 4, 2015 · 18 comments
Labels

Comments

@bodono
Copy link

bodono commented Jun 4, 2015

Both ECOS and SCS exhibit memory leaking and crashing when running the script here: jump-dev/SCS.jl#24. Creating a problem by handing and calling SCS.jl in a for-loop does not exhibit the same memory leakage (although it might be in a code-path I'm not testing), so my best guess is that it's in Convex.jl somewhere (I know julia is garbage collected, but memory leaks can still occur).

@madeleineudell
Copy link
Contributor

@bodono, thanks for finding this. But I'm not entirely sure how to go about looking for the problem in a garbage collected language...

@mlubin
Copy link
Member

mlubin commented Jun 8, 2015

Module-level variables could do it.

@pluskid
Copy link

pluskid commented Dec 1, 2015

I need to run several thousand small linear programs in an iterative algorithm. The memory quickly exploded (with various solvers including Clp, ECOS, GLPK, etc.). Currently I have run for a few hundred iterations, save the results into file, exit Julia and then start a new session, load the saved results and continue running.

@mlubin
Copy link
Member

mlubin commented Dec 1, 2015

@pluskid, any reason you haven't tried JuMP?

@pluskid
Copy link

pluskid commented Dec 1, 2015

@mlubin Thanks! No particular reason. I just arrived at juliaopt.org, found JuMP.jl and Convex.jl. I picked the second one by applying the first impression that Convex.jl is more dedicated to convex problems and the stereotype that dedicated packages typically runs faster.

I will try JuMP later. :)

@daviddelaat
Copy link
Contributor

The global variables are indeed the problem. I submitted a pull request (#123) with a temporary workaround. With this merged you have to call Convex.clearmemory() every now and then.

@madeleineudell
Copy link
Contributor

David, thanks for the PR! And for confirming that the global variables are at fault.

Any ideas on the best permanent solution? The global cache of variables is used to avoid extra work when variables and expressions are reused, even on different problems. The problem is that even when all other pointers to the variable or expression have evaporated, the pointer in this global cache remains, preventing garbage collection.

Is there any way to check whether the only pointer to the julia object (variable or expression) is coming from this dictionary, at which point we can delete it from the dictionary as well to pave the way for garbage collection?

@mlubin
Copy link
Member

mlubin commented Feb 10, 2016

Try using a WeakKeyDict: https://github.com/JuliaLang/julia/blob/464904e17256e0e3fdb9f50d0d401abf3d8547d1/base/dict.jl#L823. I've confirmed with the Julia overlords that it's kosher even though it's undocumented.

@madeleineudell
Copy link
Contributor

Cool, trying it now.

David, I'm surprised you needed to include var_to_ranges in the set
of dictionaries to clear; it shouldn't have been global in the first
place. Did you observe problems when you didn't reset that dictionary?

On Wed, Feb 10, 2016 at 3:07 PM, Miles Lubin notifications@github.com wrote:

Try using a WeakKeyDict:
https://github.com/JuliaLang/julia/blob/464904e17256e0e3fdb9f50d0d401abf3d8547d1/base/dict.jl#L823.
I've confirmed with the Julia overlords that it's kosher even though it's
undocumented.


Reply to this email directly or view it on GitHub.

Madeleine Udell
Postdoctoral Fellow at the Center for the Mathematics of Information
California Institute of Technology
https://courses2.cit.cornell.edu/mru8
(415) 729-4115

@madeleineudell
Copy link
Contributor

Trying:

id_to_variables = WeakKeyDict{UInt64, Variable}()
conic_constr_to_constr = WeakKeyDict{ConicConstr, Constraint}()

I see a LoadError: objects of type UInt64 cannot be finalized.

Anyone know why UInt64s (which we use as Variable and Constraint ids) can't be finalized? Is there another kind of key we could use that could be finalized, or can I define a finalizer?

Full error:

ERROR: LoadError: LoadError: objects of type UInt64 cannot be finalized
 in finalizer at /Applications/Julia-0.4.0.app/Contents/Resources/julia/lib/julia/sys.dylib
 in setindex! at dict.jl:821
 in call at /Users/madeleine/.julia/v0.4/Convex/src/variable.jl:21
 in call at /Users/madeleine/.julia/v0.4/Convex/src/variable.jl:25
 in anonymous at /Users/madeleine/.julia/v0.4/Convex/test/test_utilities.jl:7
 in context at /Users/madeleine/.julia/v0.4/FactCheck/src/FactCheck.jl:474
 in anonymous at /Users/madeleine/.julia/v0.4/Convex/test/test_utilities.jl:5
 in facts at /Users/madeleine/.julia/v0.4/FactCheck/src/FactCheck.jl:448
 in include at /Applications/Julia-0.4.0.app/Contents/Resources/julia/lib/julia/sys.dylib
 in include_from_node1 at /Applications/Julia-0.4.0.app/Contents/Resources/julia/lib/julia/sys.dylib
 [inlined code] from /Users/madeleine/.julia/v0.4/Convex/test/runtests_single_solver.jl:21
 in anonymous at no file:20
 in include at /Applications/Julia-0.4.0.app/Contents/Resources/julia/lib/julia/sys.dylib
 in include_from_node1 at /Applications/Julia-0.4.0.app/Contents/Resources/julia/lib/julia/sys.dylib
 in process_options at /Applications/Julia-0.4.0.app/Contents/Resources/julia/lib/julia/sys.dylib
 in _start at /Applications/Julia-0.4.0.app/Contents/Resources/julia/lib/julia/sys.dylib
while loading /Users/madeleine/.julia/v0.4/Convex/test/test_utilities.jl, in expression starting on line 4
while loading /Users/madeleine/.julia/v0.4/Convex/test/runtests_single_solver.jl, in expression starting on line 19

@mlubin
Copy link
Member

mlubin commented Feb 10, 2016

Oh, I think the WeakKeyDict does the wrong thing. It keeps a weak reference to the keys, but here we want a weak reference to the values. Should just need a little tweaking of the code to accomplish that.

@mouryarahul
Copy link

Is the problem resolved? The memory usage keep on exploding for the following piece of script:

`# Array to store Result
results = zeros(nTrials, length(nMeasurements))
for i = 1:length(nMeasurements)
M = nMeasurements[i] # Numbers of measurement

# Define the convex problem
ω = Convex.Variable(N) # define variables
objective = Convex.norm(ω, 1)

for j = 1:nTrials
    # Sample the Original Sensing matrix
    q = Random.randperm(rng, N)
    Φ = A[q[1:M],:]
    # Generate new measurements
    y = Φ*x

    # Redefine the convex problem
    constraint = (Φ*Ψ*ω - y == 0.0) # define constraint
    problem = Convex.minimize(objective, constraint) # define problem

    # Solve the problem
    Convex.solve!(problem, solver_ecos, warmstart=false)
    sol = Ψ*ω.value
    # Normalized Reconstruction Error
    ne = LinearAlgebra.norm((x-sol), 2)/norm(x, 2)
    println("Problem Status: ", problem.status, " for Trial No: $j and Measurement: $M with NE = $ne")
    results[j,i] = ne
end
GC.gc()
Convex.clearmemory()

end

`
Any workaround?
Thanks!

@ericphanson
Copy link
Collaborator

Can you include the values of N, etc, so I can copy-paste the code to try to reproduce? (By the way, I think the code isn't quoted correctly in your post). Also, what version of Convex and Julia are you using?

By the way, there's another issue talking about memory usage here (with likely a different underlying cause): #254; maybe that's related? I ask because I think Convex.clearmemory() should be working ok for what it is supposed to do: clear out the two global variables that Convex uses which could get large after many runs. (But maybe it's not!)

@scottstanie
Copy link

I'm seeing a similar problem of steadily growing memory usage when repeatedly optimizing in a loop, even when I include Convex.clearmemory(). Also adding GC.gc() slows the memory growth, but it still eventually crashed.

I'll see if I can post a minimum example to try, but the full program I'm running that encountered this grew over the course of 8-9 hours.

An interesting part is that the solver also slows down as the memory grows even before crashing- not sure if that can give a hint as to where it's happening. If anyone can suggest how I might debug this, I'm open to it (I tried profiling once, but it said the profile stack kept filling up)

@ericphanson
Copy link
Collaborator

ericphanson commented Aug 16, 2019

Thanks for the report. Convex has two global caches it uses, which are dictionaries, and what Convex.clearmemory() does is:

function clearmemory()
    global id_to_variables
    empty!(id_to_variables)
    global conic_constr_to_constr 
    empty!(conic_constr_to_constr)
    GC.gc()
end

I think the problem is that dictionaries don't actually free all their memory when they are emptied; their values are just set to undef. You can see this if you do dump(D) for D the dictionary of interest. You can track the memory usage of these two globals via

Base.summarysize(Convex.id_to_variables)

and

Base.summarysize(conic_constr_to_constr)

I'm not sure how to actually clear the memory of the dictionaries (I had thought empty! would do the trick). I think, however, this is a regression unknowingly introduced by me in #286; the old behavior was to not have const dictionaries so they could just be rebound to fresh dictionaries when you called Convex.clearmemory. The cost of non-const dictionaries is that they might be slightly slower to use on each call (i.e. slightly slower model formulation, but not slower solving of course). The difference wasn't huge though, so my inclination is to revert #286 to restore the previous behavior, which should actually free that memory.

What do you think?

Edit: sorry @mouryarahul, I think this must be what you were seeing too. When you posted I was totally in the mind that empty! did what I thought it did, and only recently was rethinking that. I think you must be seeing the same problem here.

@scottstanie
Copy link

scottstanie commented Aug 16, 2019

I guess here's an example where if you watch htop/top for the mem% column of julia, you see it slowly creep up even with the clearing/gc()

using Convex
using ECOS

function testmem()
       x = Variable(10)
       A = rand(100, 10)
       z = randn(1000, 1000, 100)
       for i=1:1000, j = 1:1000
           b = z[i, j, :]
           problem = minimize(norm(A*x - b, 1))
           solve!(problem, ECOSSolver(verbose=0))
           if (i*1000 + j) % 1000 == 0
               Convex.clearmemory()
               GC.gc()
               x = Variable(10)
               println("cleared")
           end
       end
end

testmem()
julia> versioninfo()
Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

(v1.1) pkg> st
    Status `~/.julia/environments/v1.1/Project.toml`
...
  [f65535da] Convex v0.12.2

this was also happening on julia 1.3 and convex v0.12.3

@ericphanson
Copy link
Collaborator

If I modify your code to

using Convex
using ECOS

function testmem()
       x = Variable(10)
       A = rand(100, 10)
       z = randn(1000, 1000, 100)
       for i=1:1000, j = 1:1000
           b = z[i, j, :]
           problem = minimize(norm(A*x - b, 1))
           solve!(problem, ECOSSolver(verbose=0))
           if (i*1000 + j) % 1000 == 0
                @info "Before clearing" i j Base.summarysize(Convex.id_to_variables)/1e6 Base.summarysize(Convex.conic_constr_to_constr)/1e6
               Convex.clearmemory()
               GC.gc()
               x = Variable(10)
               @info "After clearing" i j Base.summarysize(Convex.id_to_variables)/1e6 Base.summarysize(Convex.conic_constr_to_constr)/1e6
           end
       end
end

testmem()

Then we can it seems like clearing is quite effective in terms of reducing the memory use of those two globals, and even though there is some memory still used afterwards, the total memory used by those two globals stays under 60 mb (growing until about that, then resetting to close to zero). So I think you're actually seeing a different problem than something that Convex.clearmemory() solves, and reverting #286 won't help with that. I see also that the total memory used by Julia climbs over the script just as you pointed out.

I think it that actually be a problem with ECOS; if I switch to SCS, then the total memory used by the script doesn't seem to be growing. Can you try SCS?

If you want to try with the old behavior, do ] add Convex#revert-286-master and try your script (and free Convex to return to the current release). I'm up for reverting #286 if it really was a regression, but now I'm not so sure.

@ericphanson
Copy link
Collaborator

Closed by #322

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

8 participants