-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Support object stack allocation #11192
Comments
Why is this needed? |
We'll have to allocate space for ObjHeader before the stack-allocated object unless we can prove that it won't be needed for synchronization or hash code storage. |
I think using sychronization, etc. should prohibit stack allocation, for the initial iteration at least. Otherwise, you would also need a helper call to clear these objects from the synchronization tables and teach the synchronization tables to allow references to objects that do not live on the GC heap. |
How would stack allocated object fields would be accessed? Would existing FIELD/IND nodes be replaced with LCL_FLD/LCL_VAR nodes? Are writer barriers still needed if the object is allocated on stack? |
What would be the point of trying to synchronize on a stackallocated object? If it's considered for stack allocation then it means it doesn't escape which means no other thread can attempt to synchronize on it so any attempts to do so should be a no-op. The hash code could be derived from it's stack address as the object won't move until its deallocated. |
We'll need to detect calls to RuntimeHelpers.GetHashCode on the stackallocated object. They will be problematic if we don't allocate ObjHeader before the object. |
Stack allocated object will be treated like a struct and its fields may be promoted. No write barriers are needed if we are assigning to a field of a stack-allocated object. |
We will have cases where we are assigning to a field of an object that may live on the stack or on the heap (e.g., at joins or when passing a stack allocated object to another method). We'll have to detect these cases and use checked write barriers. |
I suppose calls to RuntimeHelpers.GetHashCode will look like calls to native code so we will consider the object escaping and won't stack-allocate, so we don't need ObjHeader for that. |
dotnet/coreclr#20814 implemented an initial version of object stack allocation. I updated the items in the description of this issue to mark what's done and added more items to the road map. |
dotnet/coreclr#21950 added several improvements: |
Call it a "lock elision" optimization and highlight it as a feature 😉 I think that's what Java does, unless I misunderstand what they do. Equally could drop interlocked in objects that don't escape (where they operate on the object)? |
AFAIR the first Java collection classes were synchronized. Of course, they found out that this isn't such a great idea and their implementation of escape analysis was also trying to help by removing synchronization when the collection object wasn't escaping the stack. |
I've got very promising result in my simple test code, but it still has a gap from structs (majorly because the lack of work item [Benchmark]
public int PointClassTest()
{
var p1 = new PointClass(4, 5);
var p2 = new PointClass(3, 7);
var result = AddClass(p1, p2);
return result.X + result.Y;
}
[Benchmark]
public int PointStructTest()
{
var p1 = new PointStruct(4, 5);
var p2 = new PointStruct(3, 7);
var result = AddStruct(p1, p2);
return result.X + result.Y;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
PointClass AddClass(PointClass x, PointClass y)
{
return new PointClass(x.X + y.X, x.Y + y.Y);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
PointStruct AddStruct(PointStruct x, PointStruct y)
{
return new PointStruct(x.X + y.X, x.Y + y.Y);
}
record class PointClass(int X, int Y);
record struct PointStruct(int X, int Y); BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
Intel Core i7-7660U CPU 2.50GHz (Kaby Lake), 1 CPU, 4 logical and 2 physical cores
.NET SDK=6.0.100-preview.7.21379.14
[Host] : .NET 6.0.0 (6.0.21.37719), X64 RyuJIT
PGO + EA : .NET 6.0.0 (6.0.21.37719), X64 RyuJIT, TieredPgo + TieredCompilation + TC_QuickJit + TC_QuickJitForLoops + JitObjectStackAllocation
PGO + No EA : .NET 6.0.0 (6.0.21.37719), X64 RyuJIT, TieredPgo + TieredCompilation + TC_QuickJit + TC_QuickJitForLoops
Runtime=.NET 6.0
Codegen: .NET 6.0.0 (6.0.21.37719), X64 RyuJIT, Config: PGO + EA
.NET 6.0.0 (6.0.21.37719), X64 RyuJIT, Config: PGO + No EA
Is there any roadmap to make further progress on this? |
Is this actively being worked on? |
@TonyValenti no, see #1661 (comment) |
We will shortly be working on our post 7.0 plans and this is definitely in the mix. In particular I would like to see us working towards stack allocation of some boxes, since:
|
Enable object stack allocation for ref classes and extend the support to include boxed value classes. Use a specialized unbox helper for stack allocated boxes, both to avoid apparent escape of the box by the helper, and to ensure all box field accesses are visible to the JIT. Update the local address visitor to rewrite trees involving address of stack allocated boxes in some cases to avoid address exposure. Disable old promotion for stack allocated boxes (since we have no field handles) and allow physical promotion to enregister the box method table and/or payload as appropriate. In OSR methods handle the fact that the stack allocation may actually have been a heap allocation by the Tier0 method. The analysis TP cost is around 0.4-0.7% (notes below). Boxes are much less likely to escape than ref classes (roughly ~90% of boxes escape, ~99.8% of ref classes escape). Codegen impact is diminished somewhat because many of the boxes are dead and were already getting optimized away. Fixes #4584, #9118, #10195, #11192, #53585, #58554, #85570 --------- Co-authored-by: Jakob Botsch Nielsen <jakob.botsch.nielsen@gmail.com> Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Fixed by #103361. |
btw there're still some unfinished items like:
|
@hez2010 But if you guys could get the point in the list finished: |
No. string is completely different and unlike any other types, allocations of |
This issue will track work on supporting object stack allocation in the jit. See dotnet/coreclr#20251 for the document describing the work.
The initial goal is to be able to remove heap allocation in a simple example like this:
and then in a similar example where the class has gc fields.
Proposed initial steps are:
I will be modifying and extending this list as the work progresses.
cc @dotnet/jit-contrib
category:cq
theme:object-stack-allocation
skill-level:expert
cost:extra-large
impact:large
The text was updated successfully, but these errors were encountered: