-
Notifications
You must be signed in to change notification settings - Fork 23
New Allocator Design
Rho has been using the BDWGC library for its allocation support, but BDWGC is not ideal for the allocation patterns in Rho. Typically Rho allocates many small objects, and needs to do many pointer lookups.
BDWGC does all its pointer lookups in a hash table. A custom allocator for Rho could take advantage of the fact that most objects are quite small, and store those objects in an arena for very fast pointer lookup.
Requirements:
- Identify pointers to allocated blocks (stack scanning).
- Iterate over allocated blocks (sweep phase in GC).
Goals:
- Efficient pointer identification.
- Efficient allocation of common sizes (48 byte objects are very common in scalar code).
- Reuse recently-freed blocks.
Future work:
- Make the allocator thread safe.
- Align large blocks so that each block has a unique hash key.
Allocation sizes are divided into three categories: small (<= 256 bytes), medium (<= 2^17 bytes), and large (> 2^17 bytes). Allocations for each size category are handled differently. Small allocations are rounded up to 8 byte sizes, medium and large allocations are rounded up to the nearest power of two.
Small allocations are allocated from a fixed size arena. The arena is divided into superblocks, where each superblock has a bitset indicating which blocks are currently in use by the application.
All small object superblocks have the same size (though they may store different-sized objects) and are aligned on addresses that are even multiples of the common superblock size. This makes pointer identification very simple and fast, since the superblock header can be found by pointer arithmetic.
When the small object arena runs out of space, the allocator falls back on using the medium size allocation routine, with a minimum size of 64 bytes.
Medium sized allocations are organized into superblocks, but the superblocks are outside the small object arena. They are kept separate from the small object superblocks to reduce internal fragmentation from having to make all superblocks the same size. It is still useful to have superblocks for medium size allocations since this greatly reduces the number of entries in the separate allocation hash table.
Large allocations are allocated individually and the only metadata are the entries stored in the separate allocation hash table.
Pointer lookup is done differently for small objects, and medium/large objects. Small object identification relies on pointer manipulation. It first finds the superblock a pointer is inside of, then the block index the pointer is inside, las it checks if that block is currently allocated.
The bounds of the small object arena are initialized on startup when it is allocated. The pointer to the first and one-past the last allocated superblock in the arena are updated as more and more superblocks are allocated from the arena.
The following algorithm is used to identify small objects:
- Ensure the pointer is inside the interval [Superblock Start, Superblock End).
- Find superblock header by masking away the low bits (256K superblock size -> 18 bits).
- Ensure the pointer is not inside the superblock header. Superblock headers have fixed size = 1040 bytes.
- Calculate the block index for the superblock.
- Check the bitset in the header to see if the block at the calculated index is currently in use.
Here is the algorithm in pseudocode:
void* lookup(void* pointer) {
if (pointer >= superblock_start && pointer < superblock_end) {
header = pointer & ~((1 << 18) - 1)
if (pointer >= header + 1040) {
block_index = (pointer - (header + 1040)) / header->block_size
bitset = block_index / 64
if (header->free[bitset] & (1 << (index % 64))) {
return header + 1040 + block_index * header->block_size
}
}
}
return null
}
Medium object superblocks and and large allocations are tracked in an open addressed hash table, using quadratic probing.
Pointer lookup works by doing a lookup in the hash table. If a matching large allocation is found, the pointer to the allocation is returned. If a matching superblock is found, we continue similarly to the small object pointer lookup routine above.
The allocation hash table is described in more detail below.
A hash table is used to track medium and large allocations. The name comes from the fact that the objects it tracks are not densely packed as in the small object arena. The hash table uses quadratic probing and is growable.
The following diagram illustrates the layout of the hash table:
To calculate pointer hashes, the lowest 19 bits are shifted away. Each allocation may still span multiple hash values and so to identify all internal pointers it is often necessary to store multiple entries for an allocation in the hash table:
Even when a single allocation has size 2^19 it may be necessary to store two hash table entries for the object:
Special care has to be taken when an allocation is taken out of the hash table (superblocks are not taken out, only large allocations), to ensure that all the hash entries for the allocation are removed.
When iterating over live objects we’d like to only visit each allocation once. Given that multiple hash entries are used for each allocation we need some way of knowing which entry is first. This can either be done by checking the allocation start address against the hash entry, or by stealing one bit from the hash key to indicate which key is the first key for an allocation.
Whenever an object is returned from the application the allocator does not free it, instead the object is linked up in a free list. Allocations are reused as free list nodes, so no extra memory is needed other than the head pointers to each free list.
There are different free lists for each allocation size. Since large allocations are rounded up to the nearest power of two there is a flat 64-entry array for tracking large object freelists.
Two sizes of superblocks are used, 2^18 bytes for small objects (>= 32 bytes, <= 256 bytes) and 2^19 bytes for medium objects (> 256 bytes, <= 2^17 bytes). All superblocks have 1152 bytes reserved for the header. The header contains the following members:
Name Type Size (bytes)
block_size u32 32
next_untouched u32 32
free u64[] 1024 (128 * u64)
-unused - 64
The number of bits in the bitset is 1024 * 8 = 8192, which is sufficient to map all blocks with the smallest block size for each superblock size:
Superblock size (bytes) Smallest allowed object size (bytes) Max number of objects
2^18 32 (2^18-1152)/32 = 8156
2^19 64 (2^19-1152)/64 = 8174
The following equation gives an upper bound on header size based on superblock size and block size:
header_size = 128 + 8 * ceil( ceil(sb_size / block_size) / 64 )
Using integer arithmetic this becomes:
header_size = 128 + 8 * ( ( (sb_size + block_size - 1) / block_size + 63 ) / 64 )
To compute the optimal header size just substitute sb_size for (sb_size - header_size) in the above equation and do a fixed-point iteration. For the two superblock/blocksize combinations we use 1152 is the initial and optimal solution.
When using a custom allocator with Address Sanitizer we have to make sure to poison memory that is reserved by the allocator but not in use by Rho. A few extra bytes of poisoned memory is also used to pad each allocation, making it possible to detect some off by one and out-of-bounds errors.
The allocator poisons all objects that are currently not in use by the application. The BDWGC allocations were always unpoisoned when handed back to the allocator so that BDWGC could manage its metadata as needed (linking objects to free lists, etc.).
Quarantine freelists are used to ensure that objects are not directly reused after being freed when running Address Sanitizer. This helps to find use-after-free errors.
The poisoning code and quarantine code has moved into the allocator, instead of being inside GCNode.cpp
.