Skip to content

Commit

Permalink
[mono] Add old design docs from mono website.
Browse files Browse the repository at this point in the history
The original location of these documents is:

https://github.com/mono/website/tree/gh-pages/docs/advanced/runtime/docs

Changes made:
- renamed BITCODE.md -> bitcode.md
- removed some documents like xdebug.md which are no longer relevant
  • Loading branch information
vargaz committed Mar 14, 2024
1 parent b60a541 commit 5e992b9
Show file tree
Hide file tree
Showing 21 changed files with 3,627 additions and 0 deletions.
183 changes: 183 additions & 0 deletions docs/design/mono/web/aot.md

Large diffs are not rendered by default.

137 changes: 137 additions & 0 deletions docs/design/mono/web/ascii-strings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
title: ASCII Mono
---

## Introduction

This is a proposal for an optimisation to `System.String`.

For historical reasons, `System.String` uses the UCS-2 character encoding, that is, UTF-16 without surrogate pairs.

However, most strings in typical .NET applications consist solely of ASCII characters, leading to wasted space: half of the bytes in a string are likely to be null bytes!

Since strings are immutable, we can scan the character data when the string is constructed, then dynamically select an encoding, thereby saving 50% of string memory in most cases.

## Working Version

A working version of this work is currently hosted here:

[https://github.com/evincarofautumn/mono/commits/feature-strings](https://github.com/evincarofautumn/mono/commits/feature-strings)

## Updating `String`

Strings currently have the following representation:

```csharp
class String {
int length;
char firstChar;
}
```

Where `&firstChar` is the starting address of the co-allocated string data. First we can observe that the `length` field is a *signed* 32-bit integer (`System.Int32`). Changing this to an *unsigned* integer (`System.UInt32`) gives us a free bit, which we can use to tell whether the string is normal (UCS-2) or *compact* (ASCII):

```csharp
class String {
uint taggedLength;
byte firstByte;
}
```

Here, `(taggedLength & 1) == 0` indicates the non-compact encoding, for which `(char*)&firstByte` is the start of the UCS-2 character data; `(taggedLength & 1) == 1` indicates the compact encoding, for which the ASCII character data starts at `(byte*)&firstByte`.

I use the low-order bit instead of the sign bit because it lets us get the length with a simple shift, regardless of encoding:

```csharp
public int Length {
get {
return (int)(taggedLength >> 1);
}
}
```

## Getting There: Updating Native Code

Many places in Mono unsafely access `String` data, but they can be updated fairly easily: we can rename the fields, and use accessors that assert that a particular encoding is in use. However, we must be careful to verify that all those paths are covered by the test suite.

## Getting There: Disabling `fixed` on Strings

The following is a technique that helped us bootstrap the effort.

Every managed method that unsafely accesses `String` character data must be updated to account for whether the `String` is compact. This is tractable within `corlib`, but there is some third-party code that uses strings unsafely.

The `fixed` statement on strings calls a method `get_OffsetToStringData`, which is used to adjust the `fixed` pointer to refer to the character data, rather than the `String` object. In ASCII Mono, we can make this method throw a `NotSupportedException` with a message like

> Unsafe access to string data is not supported by this runtime.
Now we’re sure that only `corlib`-internal methods can access the `String` data, because only those methods have access to the `firstByte` field.

Once we have completed this auditing work, we are going to replace the `get_OffsetToStringData` with a method that duplicates
any ASCII-strings into UTF-16 strings if the user happens to call fixed on a comapct string.

## Getting there: Adding `UnsafeApply` API

In order to update existing third-party code that uses strings unsafely, we need some kind of `UnsafeApply` API:

```csharp
public unsafe T UnsafeApply<T>
(Func<BytePtr, T> compact, Func<CharPtr, T> noncompact)
```

This accepts two callbacks, one for the case of the compact encoding, and one for the non-compact encoding. This isn’t ideal, because it’s neither safe nor particularly efficient (involving the allocation of delegates). But, on the bright side, that may discourage people from continuing to use unsafe code.

## Adding `Iterator` API

In order to simplify updating existing `corlib` code, we add a private `Iterator` API that allows iterating over `String` data regardless of encoding, so we can efficiently avoid duplicating the code for `char*` and `byte*`.

The `String.Iterator` interface would provide methods such as:

* `Iterator Advance (int offset = 1)`
* `void CopyFrom (Iterator that, int count)`
* `long Difference (Iterator that)`
* `char Get (int index = 0)`
* `void Set (char value, int index = 0)`
* `int CharSize ()`
* `IntPtr Pointer ()`

And have two concrete implementations, `CompactIterator` and `NonCompactIterator`, returned by a new `String` method `GetIterator` like so:

```csharp
private static unsafe Iterator GetIterator (IntPtr data, bool compact)
{
if (compact)
return new CompactIterator (data);
return new NonCompactIterator (data);
}
```

This requires the character data pointer be pinned from the outside. This ensures that it’s pinned for the lifetime of the iterator, and that only `corlib` can use this API.

Phrasing the API in this way should let the JIT inline operations on concrete iterator types.

## Updating `StringBuilder`

`StringBuilder` is a linked list of mutable character arrays that can be frozen into a single `String` using the `ToString` method.

We add an additional Boolean to each chunk, indicating whether it’s compact (the default) or non-compact. When inserting non-ASCII characters into an ASCII chunk, the chunk degrades to UCS-2.

If all chunks of a `StringBuilder` are compact, as they are most of the time, then the result of `ToString` is compact.

## Scanning Character Data

At first blush it may seem very costly to scan every string. However, each string should only be scanned at most once, and the longer the string, the bigger the memory savings when it (probably) turns out to be compact-representable.

Moreover, we can avoid scanning strings if we know ahead of time what the encoding should be; for example, concatenating two compact strings always yields a compact string.

Scanning UCS-2 data for compact-representability is as simple as testing every character with the mask `(c & 0xFF80) == 0`, which is trivially unrollable and vectorizable. Likewise, we can scan UTF-8 data with the mask `(c & 0x80) == 0`.

## Real-world Testing

I’ve implemented a fairly stable prototype of this feature in Mono. It includes the stated changes to `String` and `StringBuilder`, as well as a fast vectorized scanner. It can build `corlib` and run the Mono and `corlib` test suites. With some effort, and patches to third-party libraries, it can run Xamarin Studio. For a large project using Roslyn code analysis, this leads to a ~10% savings in memory usage, with a small speed overhead.

## Next Steps

* Deduplicate code by using the iterator API.
* Avoid allocating intermediate `char[]` arrays by using the iterator API.
* Upstream changes to third-party libraries.
* Get feedback and harden code for correctness, safety, and security.
133 changes: 133 additions & 0 deletions docs/design/mono/web/atomics-memory-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
---
title: Atomics and Memory Model
---

## Introduction

This document describes the semantics of atomic operations and the managed memory model in C#, CIL, and the BCL.

The information here is based on the Ecma 334 and 335 specifications, MSDN documentation for the relevant BCL methods
and equivalent Win32 functions, and the source code of CoreCLR and CoreFX.

It is assumed that the reader understands basic concepts of memory models: Different memory barrier kinds, acquire and
release semantics, the meaning of atomicity, and so on.

The actual implementation of these operations in Mono is described at he end.

## Semantics

### Atomicity in the CLI

Any load or store that is smaller than or equal to `IntPtr.Size` shall be atomic, but does not imply a barrier of any
kind. Operations on 64-bit quantities are only atomic on 64-bit systems.

The source/destionation address of a load/store operation must be properly aligned for the data type for the above
guarantees to hold.

If a load or store to an address happens at the same time as another load or store to that address but of a different
size, all bets are off and no atomicity is guaranteed.

These rules apply to high-level languages like C# and F# as they target the CLI.

### `volatile.` prefix opcode in CIL

When the `volatile.` prefix opcode is used in CIL, it imposes acquire/release semantics on the next non-prefix opcode.
For loads, it results in acquire semantics. For stores, it results in release semantics.

This prefix opcode has no effect on atomicity beyond the standard rules of the CLI.

### `volatile` keyword in C\#

The `volatile` keyword in C# compiles down to CIL loads and stores prefixed with the `volatile.` opcode.

C#'s `volatile` cannot be applied to 64-bit quantities because regular loads and stores in CIL do not guarantee
atomicity for 64-bit quantities on 32-bit systems, and the `Volatile` class did not exist when the `volatile` keyword
was designed. Today, `volatile` on 64-bit quantities could conceivably be compiled down to `Volatile.Read` and
`Volatile.Write` calls.

### `Thread` class

The `VolatileRead` and `VolatileWrite` methods perform loads and stores with acquire and release semantics,
respectively. They guarantee absolutely nothing about atomicity beyond the standard rules of the CLI. In effect, this
means that the 64-bit overloads of these methods are not atomic on 32-bit systems.

There is a quirk in the .NET implementation where these methods actually use the `MemoryBarrier` method to insert a
barrier. This is stronger than a simple acquire or release barrier. We do the same for compatibility.

The MSDN documentation incorrectly states that the C# compiler emits calls to `VolatileRead` and `VolatileWrite` when
using the `volatile` keyword.

The `MemoryBarrier` method inserts a full sequential consistency barrier.

### `Volatile` class

The methods on the `Volatile` class are all atomic regardless of system bitness, and result in acquire and release
barriers for loads and stores respectively.

The 64-bit methods on this class are not atomic with respect to loads or stores made through other means than the
methods on this class and the `Interlocked` class. This is because such 64-bit operations may need to be implemented
with a lock on 32-bit systems.

The MSDN documentation incorrectly states that the C# compiler emits calls to this class's methods when using the
`volatile` keyword.

### `Interlocked` class

The methods on the `Interlocked` class are all atomic regardless of system bitness, and all have sequential consistency
semantics.

The 64-bit methods on this class are not atomic with respect to loads or stores made through other means than the
methods on this class and the `Volatile` class. This is because such 64-bit operations may need to be implemented with a
lock on 32-bit systems.

The `MemoryBarrier` method is just an alias for `Thread.MemoryBarrier`.

## Implementation

### CLI rules

When we see a CIL opcode prefixed with `volatile.`, we insert a `memory_barrier` IR opcode before or after the IR
opcodes that make up the operation. This `memory_barrier` opcode is flagged with the appropriate barrier kind
(`MONO_MEMORY_BARRIER_ACQ` or `MONO_MEMORY_BARRIER_REL`). `memory_barrier` opcodes are never reordered, and impose
the necessary reordering restrictions on the surrounding IR opcodes as well.

We expect all targets to support a `memory_barrier` opcode.

### `Thread`, `Volatile`, and `Interlocked` methods

The unoptimized behavior for these methods is to perform an icall into the runtime where they are implemented in C code
usually through C compiler intrinsics, or in the case of the 64-bit `Volatile` and `Interlocked` methods on a 32-bit
system, with a lock.

We only use the icalls on targets where, for whatever reason, we can't replace calls to these methods with IR opcodes.

### Intrinsics

On most targets, we replace calls to the BCL methods with IR opcodes.

#### `Thread` methods

Calls to `MemoryBarrier` (and the alias on `Interlocked`) are replaced with the `memory_barrier` IR opcode with the
`MONO_MEMORY_BARRIER_SEQ` kind.

Calls to `VolatileRead` and `VolatileWrite` are replaced with regular `load*_membase` and `store*_membase` IR opcodes
coupled with a `memory_barrier` IR opcode with either `MONO_MEMORY_BARRIER_ACQ` or `MONO_MEMORY_BARRIER_REL`.

#### `Volatile` methods

Calls to `Read` and `Write` are replaced with `atomic_load_*` and `atomic_store_*` IR opcodes flagged with
`MONO_MEMORY_BARRIER_ACQ` or `MONO_MEMORY_BARRIER_REL`. These opcodes imply a memory barrier by themselves and as such
cannot be reordered and impose reordering restrictions on surrounding opcodes, like the `memory_barrier` IR opcode.

#### `Interlocked` methods

Calls to `Read` are replaced with the `atomic_load_i8` IR opcode flagged with `MONO_MEMORY_BARRIER_SEQ`.

Calls to `Increment` and `Decrement` are replaced with the `atomic_add_i4` and `atomic_add_i8` IR opcodes.

Calls to `Exchange` are replaced with the `atomic_exchange_i4` and `atomic_exchange_i8` IR opcodes.

Calls to `CompareExchange` are replaced with the `atomic_cas_i4` and `atomic_cas_i8` IR opcodes.

The `atomic_add_*`, `atomic_exchange_*`, and `atomic_cas_*` IR opcodes all imply `MONO_MEMORY_BARRIER_SEQ` barriers
(despite not explicitly being flagged) and behave as such in the IR with respect to reordering restrictions.
Loading

0 comments on commit 5e992b9

Please sign in to comment.