Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with Garbage Collection #363

Closed
jasonwilliams opened this issue May 2, 2020 · 20 comments
Closed

How to deal with Garbage Collection #363

jasonwilliams opened this issue May 2, 2020 · 20 comments
Labels
help wanted Extra attention is needed memory PRs and Issues related to the memory management or memory footprint. performance Performance related changes and issues

Comments

@jasonwilliams
Copy link
Member

Current state of play

We currently use GC to wrap objects which need to live as long as something is referencing it. These are all JSValues. To make things easier we wrap all JS Values in a GC and give that the type Value.

Why not use RC?

Javascript has a lot of circular references, we could end up with memory leaks because objects will always have a reference count of 1 or more.
Constructors have a property to the prototype, and the prototype object has a property pointing back to the constructor.

More info:
https://doc.rust-lang.org/book/ch15-06-reference-cycles.html

Environments

Environments also need to be wrapped in a Gc because multiple function objects can reference the same environment, any function declaration references the outer env. So declaring more than 1 function in the same scope means they'll both have the same parent.
If a closure is returned the outer function environment needs to stay alive throughout the duration of the returned function.

Blank Diagram (1)

@Razican has some ideas on how to improve Environment GC.

@HalidOdat HalidOdat added help wanted Extra attention is needed memory PRs and Issues related to the memory management or memory footprint. performance Performance related changes and issues labels May 2, 2020
@Razican
Copy link
Member

Razican commented May 2, 2020

Yep, so around 4% of our instructions in the code are due to us using a garbage collector. The GC will once in a while for through all the garbage collected objects and decide if they should be freed or not, which takes long: for structures/enums with members, it has to go through all its mebers one by one.

This can be alleviated firs by only deriving Trace in types where a GC'd object might leave, so all Copy types should not be derived, but we should implement an empty trace on them, for example. This could be made easier once rust-gc implements Manishearth/rust-gc#87.

Then, environments have a clear hierarchy, so we don't really need a GC. Once the parent environment goes out of the scope, all inner environments no longer need to exist. If we use a GC, they will at some point be freed by the GC, but we don't need to wait for that, this can be reasoned at compile time by Rust by using Rc (or Arc if we go multithreaded at one point).

In a hierarchical tree, the Rust guide for Rc gives us a nice usage:

A cycle between Rc pointers will never be deallocated. For this reason, Weak is used to break cycles. For example, a tree could have strong Rc pointers from parent nodes to children, and Weak pointers from children back to their parents.

So, we can remove our GC'd environments in favour of a RC'd environment. This would make it easier to implement the string interner (#279) too, since Rust now expects circular references in environments, which would make lifetimes be 'static. Not nice for passing around a reference to an interner.

@jasonwilliams
Copy link
Member Author

Then, environments have a clear hierarchy, so we don't really need a GC. Once the parent environment goes out of the scope, all inner environments no longer need to exist.

Using the diagram above, its possible for outer() to return inner2 as a closure.
In which case outer will go out of scope whilst inner2 is still in-play, have we thought about this?

@Razican
Copy link
Member

Razican commented May 2, 2020

Using the diagram above, its possible for outer() to return inner2 as a closure.
In which case outer will go out of scope whilst inner2 is still in-play, have we thought about this?

In that case, would the parent environment of the function change? or do we still need the information of the previous environment?

@Razican
Copy link
Member

Razican commented May 3, 2020

I implemented Rc'd environments in https://github.com/jasonwilliams/boa/tree/rc_env.This was working perfectly, but when I tried to re-base with the new function objects, this gives issues with the environment in Function.

It seems this is the only object that contains an environment. One small question about this. Is this the inner environment or the outer environment?

To implement Trace on it, we need for the environment to implement Trace, but this is not possible with Rc or Weak unless rust-gc implements it for them, and I don't think it's possible to do it.

How is this environment used?

@jasonwilliams
Copy link
Member Author

jasonwilliams commented May 3, 2020

It seems this is the only object that contains an environment. One small question about this. Is this the inner environment or the outer environment?

From what I understand (please take a second look at the spec as I may be wrong) Function’s environment property is a reference to the outer environment.

The inner environment doesn’t exist until the function is actually called. When the inner function is created, it’s parent is set to function[[Environment]].

This allows callbacks to happen at a later point whilst stil maintaining lexical scope, however it also means parent environments could be referenced long after theyve gone out of scope. (Diagram above: outer finishes but inner1 is a callback somewhere)

Because functions are “run-to-completion” there’s no need to store the inner environment. However with generators I think that changes.

@Razican
Copy link
Member

Razican commented May 9, 2020

Looking into this, I noticed we have the InternalStateCell type, that implements Trace for a Gc.

That Gc can hold Any type, which I think makes it unsound :/ maybe @Manishearth can give us some hints about this.

Do we really need it to be of Any type? Can't we at least reduce its scope a bit?

In any case, implementing Trace for Rc would be very nice, since we could just implement Trace for a Rc pointer to an environment in a Function object.

@Manishearth
Copy link

You should hold a Trace instead of an Any

@Razican
Copy link
Member

Razican commented May 9, 2020

You should hold a Trace instead of an Any

Would this require a special Trace implementation for the InternalStateCell, or would an empty trace be enough?

In any case, this brings the extra problem, since we can no longer use downcast_ref/mut() on state, since they are not Any. I tried implementing it in #387 but I'm having issues.

@Manishearth
Copy link

You can have a trait that inherits from Any and Trace and make a trait object of it

@Razican
Copy link
Member

Razican commented May 10, 2020

You can have a trait that inherits from Any and Trace and make a trait object of it

It seems that this is not enough, or at least I'm not able to make it work. I get the errors listed in https://github.com/jasonwilliams/boa/pull/387/files#r422625290

@Manishearth
Copy link

Right, you can't cast between trait objects directly. But in this case what you need is for that trait to inherit from Trace as well.

@Manishearth
Copy link

To fix those errors you just need to cast to InternalState, not Any. If that's not enough you might have to <_ as Any>::downcast_ref(&thing)

@sphinxc0re
Copy link
Contributor

I don't know if it applies here, but did you take a look at this: https://github.com/kyren/gc-arena
It seems like a promising technology.

@Manishearth
Copy link

Trait object issues are going to occur in any type that is of the form Gc<T> where T: SomeTraceTrait. The issues boa is hitting are fundamental Rust issues, not Gc issues.

@sphinxc0re
Copy link
Contributor

Okay, but how about https://github.com/Others/shredder ? It looks like it supports cyclic structures as well

@jasonwilliams
Copy link
Member Author

@sphinxc0re shredder looks very interesting and i would be happy for us to try it out

@Manishearth
Copy link

Shredder forces concurrency, which a JS runtime will likely not want since JS objects are single threaded

@sphinxc0re
Copy link
Contributor

There is no information on "forcing" concurrency. The README states: ready for fearless concurrency: works in multi-threaded contexts. "Works in" sounds to me like it's optional. Could you elaborate where you found that information?

@jasonwilliams
Copy link
Member Author

I think @Manishearth is referring to the fact that Shredder is built on top of Arc where as rust-gc is built on top of Rc. There's pros and cons to both.

If you don't need multithreading you're still using Arc under the hood, which will be slower than RC because it generates atomic instructions for access. Its not easy to know how much time is lost until you can measure the before and after.

Like wise if you stick with RC, it makes it difficult to move GC'd objects across threads (Which Boa may need to do at some point).

Shredder forces concurrency, which a JS runtime will likely not want since JS objects are single threaded

I don't know if this is strictly true, from what i understand JS Objects can move across realms, and each realm can be backed by a thread.
http://speakingjs.com/es5/ch17.html

@Manishearth
Copy link

Manishearth commented Jun 14, 2020

it makes it difficult to move GC'd objects across threads (Which Boa may need to do at some point)

Not if you want to be spec-compliant 😄

each realm can be backed by a thread.

This is incorrect. The javascript execution model is single-threaded; all realms which may share objects (not including SharedArrayBuffer, which I'll get to) must have their event loops run on the same thread.

The specification calls this an "agent", annoyingly I cannot find any text in the specification that explains this clearly, rather, the spec doesn't really work without this. I might file an issue to get this clarified. Regardless, the HTML specification explains this a bit clearer: objects that can share objects must be in the same agent, and thus, the same event loop.

It's pretty clear why the specification is single-threaded from one observation: aside from SharedArrayBuffer and atomic operations (only used within SharedArrayBuffer), there are no locks in spec text anywhere. There are multiple algorithms which mutate state, but none of them have locks between the checks where they see if the state is allowed to be mutated or not, and when they mutate the state. For example the indexed setter algorithm for objects like arrays must check if the buffer is detached before writing, but if objects could be shared across threads, it would be very easy for an array buffer to get detached (which nulls out the internal buffer!) between these two steps by a different thread.

You can have multiple agents running at once; this is what happens when you use a Worker. Workers typically have their own thread/event loop, and, crucially, they do not share garbage collected objects with a different thread. Instead, objects are sent to them via postMessage(), which uses a "structured serialize" algorithm that basically performs a deep copy, with some exceptions. The exceptions are that regular array buffers have their buffers "detached" and moved over to the other thread (preventing sharing of the buffer), and shared array buffers have new SharedArrayBuffer objects created on the other side, which share the internal buffer. Since this is a new object, it has independent GC semantics, and the internal buffer can get collected when all of its parent SharedArrayBuffers are collected. In other words, the "shared" part of SharedArrayBuffer is a job for an Arc.

You can, of course, choose to allow an agent to use multiple threads for efficiency and put in some painstaking effort to ensure that all the single threaded happens-before relationships are still maintained for any data dependency. This would be an immense amount of work and multithreaded garbage collection is the least of your worries. JavaScript does not like being put on multiple threads.

@boa-dev boa-dev locked and limited conversation to collaborators Jan 31, 2022
@Razican Razican converted this issue into discussion #1812 Jan 31, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
help wanted Extra attention is needed memory PRs and Issues related to the memory management or memory footprint. performance Performance related changes and issues
Projects
None yet
Development

No branches or pull requests

5 participants