Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binder_common: Fix uninitialized marshalling for PtrToArg<char32_t> #95317

Merged
merged 1 commit into from
Aug 28, 2024

Conversation

esainane
Copy link
Contributor

@esainane esainane commented Aug 9, 2024

C# uses longs to access many native values. With PtrToArg<m_enum> and PtrToArg<bitfield<m_enum>> this isn't a problem, as C++ code converts through a *(int64_t*) cast in assignment, so all 64-bits are initialized.

However, with PtrToArg<char32_t>, value assignment happens through an *(int *) cast, leaving 32 bits uninitialized where int is 32 bits. On platforms where int is 16 bits, there are presumably 48 bits uninitialized, though there are very few platforms where this is still the case.

The easiest way to see the practical effects of this is by looking at EventInputKey.Unicode:

    public override void _Input(InputEvent @event) {
        if (@event is InputEventKey keyEvent) {
            if (keyEvent.IsPressed() && !keyEvent.Echo) {
                var raw = keyEvent.Unicode;
                var value = raw & 0xffffffff;
                GD.Print($"Key pressed: raw: {raw}; masked: {(char) value} ({value})");
            }
        }
    }

Pressing 'a' emits the following line:

Key pressed: raw: -3617008645356650399; masked: a (97)

Examining execution flow in gdb shows this conversion going through the following line:

PtrToArg<char32_t>::encode (p_ptr=0x7ffcd5bb4b18, p_val=97 U'a') at ./core/variant/binder_common.h:221
221			*(int *)p_ptr = p_val;

Here, p_val is still 97, which is the value InputEventKey.Unicode is expected to have. After assignment, p *(int64_t *)0x7ffcd5bb4b18 displays -3617008645356650399, with only the lower 32 bits being properly assigned, and is the value we see from C#.

With this patch applied, the above testing _Input now prints:

Key pressed: raw: 97; masked: a (97)

Thank you to blujay1269 for asking about an unexpected value they saw in EventInputKey.Unicode, which prompted this investigation.

C# uses `long`s to access many native values. With `PtrToArg<m_enum>` and
`PtrToArg<bitfield<m_enum>>` this isn't a problem, as C++ code converts
through a `*(int64_t*)` cast in assignment, so all 64-bits are initialized.

However, with `PtrToArg<char32_t>`, value assignment happens through an
`*(int *)` cast, leaving 32 bits uninitialized where `int` is 32 bits. On
platforms where `int` is 16 bits, there are presumably 48 bits uninitialized,
though there are very few platforms where this is still the case.

The easiest way to see the practical effects of this is by looking at
`EventInputKey.Unicode`:

```csharp
    public override void _Input(InputEvent @event) {
        if (@event is InputEventKey keyEvent) {
            if (keyEvent.IsPressed() && !keyEvent.Echo) {
                var raw = keyEvent.Unicode;
                var value = raw & 0xffffffff;
                GD.Print($"Key pressed: raw: {raw}; masked: {(char) value} ({value})");
            }
        }
    }
```

Pressing 'a' emits the following line:
```
Key pressed: raw: -3617008645356650399; masked: a (97)
```

Examining execution flow in gdb shows this conversion going through the
following line:
```
PtrToArg<char32_t>::encode (p_ptr=0x7ffcd5bb4b18, p_val=97 U'a') at ./core/variant/binder_common.h:221
221			*(int *)p_ptr = p_val;
```

Here, `p_val` is still 97, which is the value `InputEventKey.Unicode`
is expected to have. After assignment, `p *(int64_t *)0x7ffcd5bb4b18` displays
`-3617008645356650399`, with only the lower 32 bits being properly assigned,
and is the value we see from C#.

With this patch applied, the above testing `_Input` now prints:
```
Key pressed: raw: 97; masked: a (97)
```

Thank you to blujay1269 for asking about an unexpected value they saw in
`EventInputKey.Unicode`, which prompted this investigation.
@esainane esainane requested a review from a team as a code owner August 9, 2024 05:17
@Chaosus Chaosus added this to the 4.4 milestone Aug 9, 2024
@akien-mga akien-mga changed the title binder_common: Fix uninitialized marshalling binder_common: Fix uninitialized marshalling for PtrToArg<char32_t> Aug 9, 2024
@akien-mga akien-mga requested a review from a team August 9, 2024 08:01
@raulsntos
Copy link
Member

Integer types usually have metadata associated to them so bindings can choose the correct type. For example, methods that return uint8_t in Godot use METADATA_INT_IS_UINT8. So a method like uint8_t StreamPeer::get_u8() is generated in C# with a byte return type.

By the way, godot-cpp probably has the same problem since they generate the method with a int64_t return type. cc @dsnopek

Maybe char32_t should use METADATA_INT_IS_INT32. Although this means language bindings will use int32_t which is still not entirely correct, should we add something like METADATA_INT_IS_CHAR32? That way bindings can pick the most appropriate type for a 32-bit unicode character (i.e.: System.Text.Rune in C#).

Just to be clear, changing the metadata breaks compatibility for the C# bindings because then the InputEventKey.Unicode property will change its type from long to int. However, we can change the C# bindings generator to avoid this and keep using long in the public API, but handle the value properly internally according to its metadata (so it will be handled as an int and dismiss the uninitialized bits).

@dsnopek dsnopek requested a review from a team August 10, 2024 12:55
Copy link
Contributor

@dsnopek dsnopek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wow, good catch! The intention of the ptrcall encoding is that all integer types get encoded as int64_t, so I think your change is essentially correct.

Although, I do wonder if this should be moved to method_ptrcall.h and done with the macros there like all the other integer types? It would just be:

MAKE_PTRARGCONV(char32_t, int64_t);

If that works, I think it would be better to put it in there with the other integer types - I didn't even know this one was hiding over in binder_common.h.

@dsnopek
Copy link
Contributor

dsnopek commented Aug 10, 2024

By the way, godot-cpp probably has the same problem since they generate the method with a int64_t return type.

Yeah, I think you're right!

Maybe char32_t should use METADATA_INT_IS_INT32. Although this means language bindings will use int32_t which is still not entirely correct, should we add something like METADATA_INT_IS_CHAR32? That way bindings can pick the most appropriate type for a 32-bit unicode character (i.e.: System.Text.Rune in C#).

I agree that adding a METADATA_INT_IS_CHAR32 could make sense. However, the question after that would be if we should also have a metadata flag for char16_t and wchar_t?

However, the main thing to fix here is the issue with how the data is encoded. We can figure out how (or if) to improve the metadata in a follow-up.

@esainane
Copy link
Contributor Author

Is there anything I can do to help at this point?

@akien-mga akien-mga merged commit 622393c into godotengine:master Aug 28, 2024
18 checks passed
@akien-mga
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants