Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop using finalizers for resource management? #11207

Open
JeffBezanson opened this issue May 9, 2015 · 84 comments
Open

stop using finalizers for resource management? #11207

JeffBezanson opened this issue May 9, 2015 · 84 comments
Labels
design Design of APIs or of the language itself GC Garbage collector speculative Whether the change will be implemented is speculative
Milestone

Comments

@JeffBezanson
Copy link
Member

Finalizers are inefficient and unpredictable. And with the new GC, it might take much longer to get around to freeing an object, therefore tying up its resources longer. Ideally releasing external resources should not be tied to how memory management works.

We are already not far from this with the open(f) do construct. I think that and/or with should be used. Perhaps there could be some other mechanism for registering files to close eventually.

Discussed this with @carnaval .

@JeffBezanson JeffBezanson added the speculative Whether the change will be implemented is speculative label May 9, 2015
@ScottPJones
Copy link
Contributor

Hmmm... I was just going to start using finalizers (but still had some questions about them to investigate).
My needs are simple: 1 pointer in Julia to a structure allocated / controlled by C, which also contains a pointer possibly allocated by my C code, or else allocated by DBMS allocator. If the object goes out of scope in Julia, and is about to be GCed, I thought the finalizer would allow me to call my C release code... It concerned me though that finalizers are apparently associated with the object, not the type.

@quinnj
Copy link
Member

quinnj commented May 9, 2015

Go has the defer keyword, the usage is:

f := os.Open(file)
defer f.Close()

which pushes f.Close() onto a stack of function calls that get evaluated
when the enclosing scope ends.

http://blog.golang.org/defer-panic-and-recover

-Jacob

On Fri, May 8, 2015 at 8:48 PM, Scott P. Jones notifications@github.com
wrote:

Hmmm... I was just going to start using finalizers (but still had some
questions about them to investigate).
My needs are simple: 1 pointer in Julia to a structure allocated /
controlled by C, which also contains a pointer possibly allocated by my C
code, or else allocated by DBMS allocator. If the object goes out of scope
in Julia, and is about to be GCed, I thought the finalizer would allow me
to call my C release code... It concerned me though that finalizers are
apparently associated with the object, not the type.


Reply to this email directly or view it on GitHub
#11207 (comment).

@timholy
Copy link
Member

timholy commented May 9, 2015

If one is rethinking this, the machinations of CUDArt to manage GPU memory in a GC-compatible way are probably amusing fodder for thought. The arrival of finalize was a huge step forward.

NOTE: 2nd link updated to correct target.

@carlobaldassi
Copy link
Member

For reference, I'll also add the case of GLPK, which bears similarities with CUDArt, see e.g. here and here.

@JeffBezanson
Copy link
Member Author

See also #1037

It would be great to get rid of finalizers entirely, but that's probably not realistic. For starters, I would still allow finalizers but not use them to close files and such.

@ScottPJones you can definitely use finalizers to call your C release code.

Finalizers can be associated with a type by adding them to all instances in the constructor :)
Seriously though, I'm not sure how it would work to associate finalizers with types. For example, you can't iterate over all dead objects to see which ones might have finalizers. And how is the right type identified? That's usually done through method calls, but who calls what function, and when, to determine what to finalize? The simplest thing is to just give the GC a list of objects with finalizers attached.

@elextr
Copy link

elextr commented May 9, 2015

@JeffBezanson it is very useful to have a mechanism to allow freeing of limited resources (like file descriptors) as soon as reasonably possible. As you say finalizers will eventually get around to it, but that doesn't prevent exhaustion in the meantime.

One question, are finalizers always run, no matter how the program exits, so its always possible to be sure any resource does not remain locked?

@ScottPJones
Copy link
Contributor

@elextr yes - that sort of exhaustion has been a big issue with the sort of code that I write, where it has to stay running with minimal downtime for years...

@elextr
Copy link

elextr commented May 9, 2015

@ScottPJones then its probably best if you do your resource management explicitly yourself, certainly don't rely on anything in the semantics of any language, unless specified and guaranteed.

Specifically the semantics of the Julia GC and hence finalizers is not guaranteed, it currently happens to have recently changed to a generational GC in 0.4, but is not generational in 0.3, and that may change in 0.4/0.5 again when threading lands (for example). All you can know about a finalizer is that the object it relates to is no longer in use when the finalizer is run, but my reading of this suggests that it may not be run for bitstypes, hence my question above.

@aviks
Copy link
Member

aviks commented May 9, 2015

Another use case is the interaction of the Java and Julia GC's in JavaCall. Objects retrieved from Java into Julia need to explicitly de-referenced in Java when they are no longer used within Julia. This is achieved via the finalizers. Which works fine, except that the Java VM can have greater memory pressure than the Julia VM. In that case, the JVM can run out of memory, before Julia decides that the GC needs to be run.

@timholy
Copy link
Member

timholy commented May 9, 2015

@ScottPJones, I hear you. In several places like HDF5 and CUDArt, the key was to write code like

open(filename) do file
    # do stuff
end

which guarantees that file will be closed (immediately upon completion) even if stuff has an error in it. That construct currently has some performance overhead (anonymous functions), but in most cases is worth it. You can manually use try...finally in cases where you can't tolerate the overhead.

@ScottPJones
Copy link
Contributor

@elextr I should have been clearer... I'm not planning on relying on the finalizers at all.
My APIs are identical (with some name changes, can't have ! in C function names), in C11, C++11,
Python, Java, and Julia...
I just want to prevent memory leaks, esp. when people are playing around / prototyping stuff in the REPL.
For example, something like the following:

myObj = DA.PackedData(1000) # creates a packed data buffer with initial size at least 1000 bytes.
push!(myObj, "Encode a string")
push!(myObj, 5.2332) # encode an IEEE binary floating point number
save!(myDBMS, myObj) # write packed record out as a row
release(myObj) # Release the underlying C buffer object, 0 out the pointer in the Julia myObj object...

What happened many times, in the REPL, is I accidentally set myObj to something else before calling release... so I lost memory each time...
Using the finalizer is just to catch stupid things like that...

@timholy That's good to know, but is that sort of syntax only for files? (sorry, my newbieness with Julia is showing again!)

@timholy
Copy link
Member

timholy commented May 9, 2015

@ScottPJones, it's a standard julia convention, see http://docs.julialang.org/en/latest/manual/functions/#do-block-syntax-for-function-arguments. You have to write a version of your function that takes another function as the first argument (see, e.g., the methods defined for open). Internally, it's just try...finally.

@tknopp
Copy link
Contributor

tknopp commented May 9, 2015

@ScottPJones No the do syntax is not restricted to files see http://julia.readthedocs.org/en/latest/manual/functions/#do-block-syntax-for-function-arguments

I think this is the standard way to do it in Julia and as @timholy said it is used in various places in Julia land. In Gtk.jl we have also some places.

Where the finalizers are important is when the type goes out of scope. We have for instance in Gtk.jl the situation where it is really needed.

@tknopp
Copy link
Contributor

tknopp commented May 9, 2015

oh Tim is faster, sorry.

@ScottPJones
Copy link
Contributor

Thanks @tknopp & @timholy! Sorry for the noise, I really am trying to memorize the manual, but Julia is such a large language!

@timholy
Copy link
Member

timholy commented May 9, 2015

It definitely takes a while, no apologies needed.

@tknopp
Copy link
Contributor

tknopp commented May 9, 2015

@JeffBezanson: What is the actual proposal of this issue? Isn't the do syntax already consistently been used for files? I think the finalizers are useful when the scope is not local.

@carnaval
Copy link
Contributor

carnaval commented May 9, 2015

We probably can't remove finalizers alltogether because then we would be leaking resources. I think this issue is more about conventions on "good practice for resource management" since the biggest problem (besides performance) is that the gc is very lazy : it only works under pressure, that is memory pressure. It has no way to know e.g. how many file descriptor are open by the program, so if your handle object is small, the gc will be completely fine keeping it around for a long time while you exhaust your open fd limit.

I don't have any good idea about this by the way...

@wildart
Copy link
Member

wildart commented May 9, 2015

I found finalizers unreliable. When interfacing with C code, I would really prefer something like Go defer rather then use finalizer to release resources. I opt to a manual resource management event though it increase several times amount of code to be written.

@JeffBezanson
Copy link
Member Author

@tknopp good question. My proposal would be

  • Have a standard "release resources" function, maybe close, maybe finalize, but the same for every type with this issue.
  • Use with instead of type-specific functions like open for this.
  • Document that this should be used instead of relying on finalizers, and use it ourselves everywhere we can.
  • Make a concerted effort to remove use of finalizers, e.g. for BigInt
  • Don't add finalizers to I/O-related objects by default. Instead maybe the REPL could add finalizers for interactive objects, to call the standard close function.

The last item sounds drastic, but as it is finalizers might not be invoked for a very long time, and unpredictably. You could still use finalizers as an escape hatch. If you're not sure how to handle releasing some object, you can just call finalizer(x) on it any time.

@tknopp
Copy link
Contributor

tknopp commented May 9, 2015

Ok. Is there some issue what with is and where it differs from the do syntax?

@carlobaldassi
Copy link
Member

I'll just add another small issue about using finalizers with IO objects which I very recently discovered: on Windows, trying to call rm on a file with an open descriptor fails. This made the FastaIO tests fail, because I was relying on finalize to close the file after I finished reading it, and I was deleting it after the tests. I never noticed the bug since on Linux that works fine.
So this is probably not a very common situation, but — in association with the unpredictability of the GC — may lead to OS-specific, non-deterministic bugs.

@elextr
Copy link

elextr commented May 9, 2015

@JeffBezanson how do you propose to handle objects whose lifetime exceeds the scope of the with, eg ones returned from the function?

@JeffBezanson
Copy link
Member Author

If an object lifetime exceeds the local scope, you can't use with. The only options I see in that case are (1) somebody downstream uses with, (2) you add a finalizer to the object before returning it.

@StefanKarpinski
Copy link
Member

Another idea is to have some types opt into reference counting and finalize them when their counts get to zero. It's not entirely clear to me how to make a mix of refcounting and not work, however.

@jakebolewski
Copy link
Member

watch out, may be flayed by mentioning reference counting :-)

@carnaval
Copy link
Contributor

the problem with mixed refcount is that a refcounted object can still be kept alive by a non refcounted one (worst case : the object keeping it alive is in oldgen). Then you don't get the "immediate finalization" property.

@carnaval
Copy link
Contributor

To alleviate the late finalization problem we could also teach the gc about other kind of resources so that it can be taken into account in the collection heuristics. So e.g. you could register a "file descriptor", or "GPU memory" something, and then explicitely say : I allocated X of this, running this finalizer will get me Y of this back.

Painful to implement though. And it can only make gc overhead worse (by collecting more often).

@StefanKarpinski
Copy link
Member

the problem with mixed refcount is that a refcounted object can still be kept alive by a non refcounted one (worst case : the object keeping it alive is in oldgen). Then you don't get the "immediate finalization" property.

Yes, in such a scheme every reference that could transitively reach anything refcounted would need to maintain a refcount. That includes most abstract slots, and slots in data structures that can refer to refcounted objects. But that still excludes most things we care about the performance of.

@aviks
Copy link
Member

aviks commented Jun 4, 2015

Ah, ok, thanks... I misunderstood.

Yes, it should be sufficient to have finalisers associated with types. Currently, every object gets the same finaliser function. Of course, the type parameters and fields will need to be available to the finaliser.

@quinnj
Copy link
Member

quinnj commented Jun 4, 2015

I wonder if the mmap and WeakKeyDict cases call for something like

finalize(a) do f
   # code to finalize `a` which is a type not declared with a `finalize` method
end

This wouldn't actually do the finalizing, just "move" a to the finalize pool of objects and the function argument would be run as the finalize method whenever that happens, either manually, from a with block, or when the object was destroyed.

Not sure how feasible "moving an object to the finalization pool of objects" would be though....

@amitmurthy
Copy link
Contributor

Is #10960 then an artifact of the new gc? That could explain memory leaks with shared and distributed arrays. An ability to explicitly "free" remote objects will be quite useful, especially in cases where people are using distributed arrays across multiple hosts specifically to leverage every bit of memory available.

@ScottPJones
Copy link
Contributor

@quinnj Carrying the discussion from #11280 over here, as requested...
You said:

the problem with being able to call your own finalize! is you then need someway to tell if an object has been finalized or not.

That's precisely what I said I'd done, I have a pointer to something that needs to be finalized, so I simply set it to zero (C_NULL) in finalize!. If you don't have a pointer, a flag can be used. It is an extra check on each reference, but you stop having segfaults or problems with things outside of Julia being released.
It was the only way I could think of currently to make sure something can be finalized quickly most of the time, and still prevent memory (or other resource) leakage when things get GCed.
Do you have any better suggestions to handle that?

@amitmurthy
Copy link
Contributor

Just noticed this when there are multiple finalizers defined for an object.

julia> type Foo
           v
       end

julia> f=Foo(0)
Foo(0)

julia> Foo(0)
Foo(0)

julia> for i in 1:10
           finalizer(f, x-> @schedule print("FINALIZED $i \n") )
       end

julia> f=nothing

julia> for i in 1:10
           print("calling gc for the $i th time\n");
           gc()
       end
calling gc for the 1 th time
calling gc for the 2 th time
FINALIZED 10 
FINALIZED 9 
calling gc for the 3 th time
FINALIZED 1 
FINALIZED 2 
calling gc for the 4 th time
FINALIZED 8 
calling gc for the 5 th time
FINALIZED 7 
calling gc for the 6 th time
FINALIZED 6 
calling gc for the 7 th time
FINALIZED 5 
calling gc for the 8 th time
FINALIZED 4 
calling gc for the 9 th time
FINALIZED 3 
calling gc for the 10 th time

Found it a little odd that all the finalizers are not executed together at the first gc itself.

@ScottPJones
Copy link
Contributor

This wouldn't happen, if @timholy's idea (seconded by @quinnj [and myself]) to use a tag bit to say whether the finalizer had been run for an object.
(or I guess that is a different object each time... never mind!)

@yuyichao
Copy link
Contributor

@amitmurthy This is somewhat related to the (sub-)issue I noticed in #11814 (comment) . My guess is that running too many finalizers at the same time will cause a too long pulse but @carnaval should know for sure.

@yuyichao
Copy link
Contributor

FWIW, the issue above in #11207 (comment) is solved by #13995 .

@amitmurthy
Copy link
Contributor

Cool. And regarding the topic of this issue - It is not just about files, I don't think we have a choice but to use finalizers for remote references. We can document that users can manually call finalize for better control on when remote resources get released, else it will only happen when gc eventually gets around to it.

@stevengj
Copy link
Member

Isn't this issue essentially a duplicate of #7721?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design of APIs or of the language itself GC Garbage collector speculative Whether the change will be implemented is speculative
Projects
None yet
Development

No branches or pull requests