Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: add Realm document in the src README.md #47932

Merged
merged 3 commits into from
Jun 1, 2023
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 53 additions & 17 deletions src/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ Typical ways of accessing the current `Isolate` in the Node.js code are:
using `args.GetIsolate()`.
* Given a [`Context`][], using `context->GetIsolate()`.
* Given a [`Environment`][], using `env->isolate()`.
* Given a [`Realm`][], using `realm->isolate()`.

### V8 JavaScript values

Expand Down Expand Up @@ -264,8 +265,8 @@ heap. Node.js exposes this ability through the [`vm` module][].
V8 refers to each of these global objects and their associated builtins as a
`Context`.

Currently, in Node.js there is one main `Context` associated with an
[`Environment`][] instance, and most Node.js features will only work inside
Currently, in Node.js there is one main `Context` associated with a
[`Realm`][] instance, and most Node.js features will only work inside
joyeecheung marked this conversation as resolved.
Show resolved Hide resolved
that context. (The only exception at the time of writing are
[`MessagePort`][] objects.) This restriction is not inherent to the design of
Node.js, and a sufficiently committed person could restructure Node.js to
Expand All @@ -276,7 +277,9 @@ Typical ways of accessing the current `Context` in the Node.js code are:

* Given an [`Isolate`][], using `isolate->GetCurrentContext()`.
* Given an [`Environment`][], using `env->context()` to get the `Environment`'s
main context.
principal [`Realm`][]'s context.
* Given a [`Realm`][], using `realm->context()` to get the `Realm`'s
context.

<a id="event-loop"></a>

Expand All @@ -303,15 +306,11 @@ Currently, every `Environment` class is associated with:

* One [event loop][]
* One [`Isolate`][]
* One main [`Context`][]
* One principal [`Realm`][]

The `Environment` class contains a large number of different fields for
different Node.js modules, for example a libuv timer for `setTimeout()` or
the memory for a `Float64Array` that the `fs` module uses for storing data
returned from a `fs.stat()` call.

It also provides [cleanup hooks][] and maintains a list of [`BaseObject`][]
instances.
different built-in modules that can be shared across different `Realm`
instances, for example a libuv timer for `setTimeout()`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should they actually share timers? I think the timer callbacks are tied to specific contexts too? Though I guess the event loop could be shared.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it would actually be very helpful to spell out what properties are being shared between different Realms for the same Environment and which not. I've seen a bunch of PRs that move properties from the Environment to the individual Realms, but it's not really clear where the line is and whether we're not essentially just introducing the ability to have multiple Environments similar to #47855.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! I've added a list of examples that can be shared across realms with an Environment.

It is still yet to be done to share the async hook info between the realms -- it is necessary to link the async continuation between realm boundaries for AsyncLocalStorage to propagate correctly. This is essential to allow JS object access between multiple execution "environments" on the same thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not really sure that this answers the question. How is the end state of Realm going to be different from what Environment is (or used to be before Realms were introduced)?

Copy link
Member Author

@legendecas legendecas May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between the end state of the Realm and the Environment would be:

  1. An Environment is associated with an Isolate and provides the per-isolate hooks and APIs, and per-thread states, for instance, inspector agents, profilers, and APIs like RequestInterrupt() and can_call_into_js().
  2. A Realm is associated with a Context and consists of a global object and can be extended as a principal realm or a synthetic realm. Each Environment has a single principal realm as its entry.

As realms share the Environment instance, we don't have to create inspector IO threads, or profiler connections for each realm. The async local storage needs to be propagated across async boundaries in another Realm of an Environment too.

Additionally, a Realm must be able to be repetitively created on the same Isolate and weakly referenced to properly support the ECMAScript ShadowRealm API.

As a conclusion, it is necessary to split the Realm and Environment to distinguish per-isolate states and per-context states.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. An Environment is associated with an Isolate and provides the per-isolate hooks and APIs,

You’re describing IsolateData here, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re describing IsolateData here, though.

Yes, it's true that IsolateData is also associated with an Isolate. However, as documented, IsolateData acts like a string table and contains information (e.g. startup snapshot data) about the isolate. It has its distinct properties compared to an Environment.

Compared to IsolateData, Environment at the end state contains the necessary handles (bootstrapped with the principal realm) to propagate events across realm boundaries, for instance, propagating async local storages and promise rejection events from ShadowRealm.

Copy link
Member

@joyeecheung joyeecheung May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current state we have is:

  1. There is IsolateData which is meant to hold per-isolate data
  2. There are Realms with are meant to hold per-context data
  3. There is Environment which has been a hotch-potch structure that people dump everything on since forever, so it contains both per-isolate data and per-context data (for the main context) in practice.

What we've been trying to do is to move per-isolate data in Environment to IsolateData and per-context data to Realm, it's happening but it's really not there yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@addaleax do you think your question has been answered?


Typical ways of accessing the current `Environment` in the Node.js code are:

Expand All @@ -325,6 +324,40 @@ Typical ways of accessing the current `Environment` in the Node.js code are:
* Given an [`Isolate`][], using `Environment::GetCurrent(isolate)`. This looks
up the current [`Context`][] and then uses that.

<a id="realm"></a>

### `Realm`

The `Realm` class is a container for a set of JavaScript objects and functions
that are associated with a particular ECMAScript global environment.
legendecas marked this conversation as resolved.
Show resolved Hide resolved

Every `Realm` instance is associated with a [`Context`][].

A `Realm` can be a principal realm or a synthetic realm. A principal realm is
created with an `Environment` as its principal global environment to evaluate
scripts. A synthetic realm is created with JS APIs like `ShadowRealm`.
legendecas marked this conversation as resolved.
Show resolved Hide resolved

Native bindings and built-in modules can be evaluated in either a principal
realm or a synthetic realm.

The `Realm` class contains a large number of different fields for
different built-in modules, for example the memory for a `Float64Array` that
the `fs` module uses for storing data returned from a `fs.stat()` call.
legendecas marked this conversation as resolved.
Show resolved Hide resolved

It also provides [cleanup hooks][] and maintains a list of [`BaseObject`][]
instances.

Typical ways of accessing the current `Realm` in the Node.js code are:

* Given a `FunctionCallbackInfo` for a [binding function][],
using `Realm::GetCurrent(args)`.
* Given a [`BaseObject`][], using `realm()` or `self->realm()`.
* Given a [`Context`][], using `Realm::GetCurrent(context)`.
This requires that `context` has been associated with the `Realm`
instance, e.g. is the principal `Realm` for the `Environment`.
* Given an [`Isolate`][], using `Realm::GetCurrent(isolate)`. This looks
up the current [`Context`][] and then uses that.
legendecas marked this conversation as resolved.
Show resolved Hide resolved

<a id="isolate-data"></a>

### `IsolateData`
Expand Down Expand Up @@ -509,7 +542,7 @@ implement them. Otherwise, add the id and the class name to the
// In the HTTP parser source code file:
class BindingData : public BaseObject {
public:
BindingData(Environment* env, Local<Object> obj) : BaseObject(env, obj) {}
BindingData(Realm* realm, Local<Object> obj) : BaseObject(realm, obj) {}

SET_BINDING_ID(http_parser_binding_data)

Expand All @@ -525,7 +558,7 @@ static void New(const FunctionCallbackInfo<Value>& args) {
new Parser(binding_data, args.This());
}

// ... because the initialization function told the Environment to store the
// ... because the initialization function told the Realm to store the
// BindingData object:
void InitializeHttpParser(Local<Object> target,
Local<Value> unused,
Expand Down Expand Up @@ -710,11 +743,13 @@ any resources owned by it, e.g. memory or libuv requests/handles.

#### Cleanup hooks

Cleanup hooks are provided that run before the [`Environment`][]
is destroyed. They can be added and removed through by using
Cleanup hooks are provided that run before the [`Environment`][] or the
[`Realm`][] is destroyed. They can be added and removed by using
`env->AddCleanupHook(callback, hint);` and
`env->RemoveCleanupHook(callback, hint);`, where callback takes a `void* hint`
argument.
`env->RemoveCleanupHook(callback, hint);`, or
`realm->AddCleanupHook(callback, hint);` and
`realm->RemoveCleanupHook(callback, hint);` respectively, where callback takes
a `void* hint` argument.

Inside these cleanup hooks, new asynchronous operations _may_ be started on the
event loop, although ideally that is avoided as much as possible.
Expand Down Expand Up @@ -776,7 +811,7 @@ need to be tied together. `BaseObject` is the main abstraction for that in
Node.js, and most classes that are associated with JavaScript objects are
subclasses of it. It is defined in [`base_object.h`][].

Every `BaseObject` is associated with one [`Environment`][] and one
Every `BaseObject` is associated with one [`Realm`][] and one
`v8::Object`. The `v8::Object` needs to have at least one [internal field][]
that is used for storing the pointer to the C++ object. In order to ensure this,
the V8 `SetInternalFieldCount()` function is usually used when setting up the
Expand Down Expand Up @@ -1050,6 +1085,7 @@ static void GetUserInfo(const FunctionCallbackInfo<Value>& args) {
[`Local`]: #local-handles
[`MakeCallback()`]: #makecallback
[`MessagePort`]: https://nodejs.org/api/worker_threads.html#worker_threads_class_messageport
[`Realm`]: #realm
[`ReqWrap`]: #reqwrap
[`async_hooks` module]: https://nodejs.org/api/async_hooks.html
[`async_wrap.h`]: async_wrap.h
Expand Down