Skip to content

Commit

Permalink
[GR-36892] [GR-38140] Move marking to exit of C calls and make native…
Browse files Browse the repository at this point in the history
… handles weak refs.

PullRequest: truffleruby/3195
  • Loading branch information
aardvark179 committed Sep 15, 2022
2 parents e24dc0c + b83e569 commit 76cfbe8
Show file tree
Hide file tree
Showing 29 changed files with 513 additions and 364 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ Changes:
* Removed `Truffle::Interop.members_without_conversion` (use `Truffle::Interop.members` instead).
* Refactored internals of `rb_sprintf` to simplify handling of `VALUE`s in common cases (@aardvark179).
* Refactored sharing of array objects between threads using new `SharedArrayStorage` (@aardvark179).
* Marking of native structures wrapped in objects is now done on C call exit to reduce memory overhead (@aardvark179).

# 22.2.0

Expand Down
109 changes: 109 additions & 0 deletions doc/contributor/cext-values.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# `VALUE`s in C extensions

## Semantics on MRI

Before we discuss the mechanisms used to represent MRI's `VALUE`
semantics we should outline what those are. A `VALUE`in a local
variable (i.e. on the stack) will keep the associated object alive as
long as that stack entry lasts (so either until the function exits, or
until that variable is no longer live). We can also wrap C structures
in Ruby objects, and when we do this we're able to specify a marking
function. This marking function is used by MRI's garbage collector to
find all the objects reachable from the structure, and allows it to
mark them in the same way it would with normal instance
variables. There are also a couple of utility methods and macros for
keeping a value alive for the duration of a function call even if it
is no longer being held in a variable, and for globally preserving a
value held in a static variable.

Because `VALUE`s are essentially tagged pointers on MRI there are also
some semantics that may be obvious but are worth stating anyway:

* Any two `VALUE`s associated with the same object will be
identical. In other words as long as an object is alive its `VALUE`
will remain constant.
* A `VALUE` for a live object can reuse the same tagged pointer that
was previously used for a now dead object.

## Emulating the semantics in TruffleRuby

Emulating these semantics on TruffleRuby is non-trivial. Although we
are running under a garbage collector it doesn't know that a `VALUE`
maps to an object, and neither does it have any mechanism for
specifying a custom mark function to be used with particular
objects. As long as `VALUE`s can remain as `ValueWrapper` objects then
we don't need to do much. Ruby objects maintain a strong reference to
their associated `ValueWrapper`, and vice versa, so we only really
need to consider situations where `VALUE`s are converted into native
handles.

### Keeping objects alive on the stack

We implement an `ExtensionCallStack` object to keep track of various
bits of useful information during a call to a C extension. Each stack
entry contains a `preservedObject`, and an additional potential
`preservedObjects` list which together will contain all the
`ValueWrapper`s converted to native handles during the process of a
call. When a new call is made a new `ExtensionCallStackEntry` is added
to the stack, and when the call exits that entry is popped off again.

### Keeping objects alive in structures

We don't have a way to run markers when doing garbage collection, but
we know we're keeping objects alive during the lifetime or a C call,
and we can record when the structure is accessed via DATA_PTR (which
should be required for the internal state of that structure to be
mutated). To do this we keep a list of objects to be marked in a
similar manner to the objects that should be kept alive, and when we
exit the C call we'll call those markers.

### Running mark functions

We run markers by recording the object being marked on the extension
stack, and then calling the marker which will in turn call
`rb_gc_mark` for the individual `VALUE`s which are held by the
structure. We'll record those marked objects in a temporary array also
held on the extension stack, and then attach that to the object
wrapping the struct when the mark function has finished.


## Managing the conversion of `VALUE`s to and from native handles

When converted to native, the `ValueWrapper` takes the following long values.

| Represented Value | Handle Bits | Comments |
|-------------------|-------------------------------------|----------|
| false | 00000000 00000000 00000000 00000000 | |
| true | 00000000 00000000 00000000 00000010 | |
| nil | 00000000 00000000 00000000 00000100 | |
| undefined | 00000000 00000000 00000000 00000110 | |
| Integer | xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxx1 | Lowest mask bit set, small longs only, convert to long using >> 1 |
| Object | xxxxxxxx xxxxxxxx xxxxxxxx xxxxx000 | No mask bits set and does not equal 0, value is index into handle map |

The built in objects, `true`, `false`, `nil`, and `undefined` are
handled specially, and integers are relatively easy because there is a
well defined mapping from the native representation to the integer and
vice versa, but to manage objects we need to do a little more work.

When we convert an object `VALUE` to its native representation we need
to keep the corresponding `ValueWrapper` object alive, and we need to
record that mapping from handle to `ValueWrapper` somewhere. The
mapping from `ValueWrapper` to handle must also be stable, so a symbol
or other immutable object that can outlive a context will need to
store that mapping somewhere on the `RubyLanguage` object.

We achieve all this through a combination of handle block maps and
allocators. We deal with handles in blocks of 4096, and the current
`RubyFiber` holds onto a `HandleBlockHolder` which in turn holds the
current block for mutable objects (which cannot outlive the
`RubyContext`) and immutable objects (which can outlive the
context). Each fiber will take values from those blocks until they
becomes exhausted. When that block is exhausted then `RubyLanguage`
holds a `HandleBlockAllocator` which is responsible for allocating new
blocks and recycling old ones. These blocks of handles however only
hold weak references, because we don't want a conversion to native to
keep the `ValueWrapper` alive longer that it should.

Conversely the `HandleBlock` _must_ live for as long as there are any
reachable `ValueWrapper`s in that block, so a `ValueWrapper` keeps a
strong reference to the `HandleBlock` it is in.
14 changes: 3 additions & 11 deletions doc/contributor/cexts.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,18 +125,10 @@ not a `VALUE`.

See [polyglot.h](https://github.com/oracle/graal/blob/master/sulong/projects/com.oracle.truffle.llvm.libraries.graalvm.llvm/include/graalvm/llvm/polyglot.h) for documentation regarding the `polyglot_*` methods.

##### Native conversion

##### ValueWrapper Long Representation
When converted to native, the `ValueWrapper` takes the following long values.

| Represented Value | Handle Bits | Comments |
|-------------------|-------------------------------------|----------|
| false | 00000000 00000000 00000000 00000000 | |
| true | 00000000 00000000 00000000 00000010 | |
| nil | 00000000 00000000 00000000 00000100 | |
| undefined | 00000000 00000000 00000000 00000110 | |
| Integer | xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxx1 | Lowest mask bit set, small longs only, convert to long using >> 1 |
| Object | xxxxxxxx xxxxxxxx xxxxxxxx xxxxx000 | No mask bits set and does not equal 0, value is index into handle map |
See [cext-values.md](cext-values.md) for documentation of the
conversion and management of native handles.

### String pointers

Expand Down
2 changes: 1 addition & 1 deletion lib/cext/ABI_check.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
9
10
29 changes: 21 additions & 8 deletions lib/truffle/truffle/cext.rb
Original file line number Diff line number Diff line change
Expand Up @@ -722,6 +722,10 @@ def rb_thread_alone
Thread.list.count == 1 ? 1 : 0
end

def rb_intern(str)
Primitive.string_to_symbol(str, true)
end

def rb_int_positive_pow(a, b)
a ** b
end
Expand Down Expand Up @@ -1456,6 +1460,12 @@ def rb_set_end_proc(func, data)
at_exit { Primitive.call_with_c_mutex_and_frame(func, [data], Primitive.caller_special_variables_if_available, nil) }
end

def define_marker(object, marker)
data_holder = Primitive.object_hidden_var_get object, DATA_HOLDER
Primitive.data_holder_set_marker(data_holder, marker)
Primitive.cext_mark_object_on_call_exit(object) unless Truffle::Interop.null?(marker)
end

def rb_data_object_wrap(ruby_class, data, mark, free)
ruby_class = Object unless ruby_class
object = ruby_class.__send__(:__layout_allocate__)
Expand All @@ -1464,7 +1474,7 @@ def rb_data_object_wrap(ruby_class, data, mark, free)

Primitive.object_space_define_data_finalizer object, free, data_holder unless Truffle::Interop.null?(free)

define_marker object, data_marker(mark, data_holder) unless Truffle::Interop.null?(mark)
define_marker object, mark

object
end
Expand All @@ -1479,19 +1489,22 @@ def rb_data_typed_object_wrap(ruby_class, data, data_type, mark, free, size)

Primitive.object_space_define_data_finalizer object, free, data_holder unless Truffle::Interop.null?(free)

define_marker object, data_marker(mark, data_holder) unless Truffle::Interop.null?(mark)
define_marker object, mark

object
end

def data_marker(mark, data_holder)
raise unless mark.respond_to?(:call)
proc { |obj|
def run_marker(obj)
Primitive.array_mark_store(obj) if Primitive.array_store_native?(obj)

data_holder = Primitive.object_hidden_var_get obj, DATA_HOLDER
mark = Primitive.data_holder_get_marker(data_holder)
unless Truffle::Interop.null?(mark)
create_mark_list(obj)
data = Primitive.data_holder_get_data(data_holder)
# This call is done without pushing a new frame as the marking service manages frames itself.
Primitive.call_with_c_mutex(mark, [data]) unless Truffle::Interop.null?(data)
mark.call(data) unless Truffle::Interop.null?(data)
set_mark_list_on_object(obj)
}
end
end

def data_sizer(sizer, data_holder)
Expand Down
3 changes: 3 additions & 0 deletions lib/truffle/truffle/cext_structs.rb
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ def RDATA_PTR(object)
raise TypeError, "wrong argument type #{object.class} (expected T_DATA)"
end

Primitive.cext_mark_object_on_call_exit(object) unless Truffle::Interop.null?(Primitive.data_holder_get_marker(data_holder))
Primitive.data_holder_get_data(data_holder)
end

Expand Down Expand Up @@ -68,6 +69,7 @@ def polyglot_members(internal)
def polyglot_read_member(name)
case name
when 'data'
Primitive.cext_mark_object_on_call_exit(@object) unless Truffle::Interop.null?(Primitive.data_holder_get_marker(@data_holder))
Primitive.data_holder_get_data(@data_holder)
when 'type'
type
Expand Down Expand Up @@ -294,6 +296,7 @@ def polyglot_pointer?
end

def polyglot_as_pointer
Primitive.cext_mark_object_on_call_exit(@array)
Primitive.array_store_address(@array)
end

Expand Down
2 changes: 1 addition & 1 deletion src/main/c/cext/string.c
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ VALUE rb_str_inspect(VALUE string) {
}

ID rb_intern_str(VALUE string) {
return SYM2ID(RUBY_INVOKE(string, "intern"));
return SYM2ID(RUBY_CEXT_INVOKE("rb_intern", string));
}

VALUE rb_str_cat(VALUE string, const char *to_concat, long length) {
Expand Down
4 changes: 2 additions & 2 deletions src/main/c/cext/symbol.c
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,11 @@ ID rb_intern(const char *string) {
}

ID rb_intern2(const char *string, long length) {
return SYM2ID(RUBY_INVOKE(rb_tr_temporary_native_string(string, length, rb_ascii8bit_encoding()), "intern"));
return SYM2ID(RUBY_CEXT_INVOKE("rb_intern", rb_tr_temporary_native_string(string, length, rb_ascii8bit_encoding())));
}

ID rb_intern3(const char *name, long len, rb_encoding *enc) {
return SYM2ID(RUBY_INVOKE(rb_tr_temporary_native_string(name, len, enc), "intern"));
return SYM2ID(RUBY_CEXT_INVOKE("rb_intern", rb_tr_temporary_native_string(name, len, enc)));
}

VALUE rb_sym2str(VALUE string) {
Expand Down
2 changes: 1 addition & 1 deletion src/main/java/org/truffleruby/RubyContext.java
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ public RubyContext(RubyLanguage language, TruffleLanguage.Env env) {
featureLoader = new FeatureLoader(this, language);
referenceProcessor = new ReferenceProcessor(this);
finalizationService = new FinalizationService(referenceProcessor);
markingService = new MarkingService(referenceProcessor);
markingService = new MarkingService();
dataObjectFinalizationService = new DataObjectFinalizationService(language, referenceProcessor);

// We need to construct this at runtime
Expand Down
7 changes: 6 additions & 1 deletion src/main/java/org/truffleruby/RubyLanguage.java
Original file line number Diff line number Diff line change
Expand Up @@ -383,7 +383,12 @@ public RubySymbol getSymbol(String string) {

@TruffleBoundary
public RubySymbol getSymbol(AbstractTruffleString name, RubyEncoding encoding) {
return symbolTable.getSymbol(name, encoding);
return symbolTable.getSymbol(name, encoding, false);
}

@TruffleBoundary
public RubySymbol getSymbol(AbstractTruffleString name, RubyEncoding encoding, boolean preserveSymbol) {
return symbolTable.getSymbol(name, encoding, preserveSymbol);
}

public Assumption getTracingAssumption() {
Expand Down
Loading

0 comments on commit 76cfbe8

Please sign in to comment.