Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report -1 for GC.GetGeneration(nongc_obj) and same for GetObjectGeneration #85017

Merged
merged 14 commits into from
Apr 23, 2023

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Apr 18, 2023

First contribution to dotnet/diagnostics#4156

Report -1 as a generation for nongc objects (also referred as "frozen objects') via GC.GetGeneration API and same for profiler API ICorProfilerInfo2::GetObjectGeneration

@ghost
Copy link

ghost commented Apr 18, 2023

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

First contribution to dotnet/diagnostics#4156

Report -1 as a generation for nongc objects (also referred as "frozen objects') via GC.GetGeneration API and same for profiler API ICorProfilerInfo2::GetObjectGeneration

Author: EgorBo
Assignees: -
Labels:

area-Diagnostics-coreclr

Milestone: -

range->rangeStart = 0;
range->rangeLength = 0;
range->rangeLengthReserved = 0;
return CORPROF_E_NOT_GC_OBJECT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning error here makes this more breaking than necessary. Why can't we just return S_OK?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning -1 seemed like it was going to be a breaking change no matter what so I figured we may as well make it more obvious rather than try to minimize it and increase risk that tools break more subtly. However we don't yet have feedback from profiler authors one way or another and this behavior is just a best guess on my part. If you think we'd be better off defaulting to keeping S_OK @jkotas I don't have a strong opinion on it.

Copy link
Member

@VSadov VSadov Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traditionally (in textbook sense) GC generations imply that genX can be collected without collecting generations X+.

-1 makes impression that any level of GC will collect frozen, while in fact it is the opposite - any generation can be collected without touching frozen.

That seems to imply that frozen objects logically belong to generation > MAX_GEN.
Perhaps 3 or MAXINT could work better that -1 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think we'd be better off defaulting to keeping S_OK

It looks odd to me to return failure error code and still try to return meaningful values. I see that there are other profiler APIs that work like that, so it should be fine to return error code. You can ignore my feedback.

Copy link
Member Author

@EgorBo EgorBo Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree with @VSadov that e.g. INT_MAX value would be safer, e.g. I already had to apply a hack in code because GC/VM usages of WhichGeneration participate in comparisons so -1 will trigger a different path than previously max_generation did. Although, this is a part of the plan to make this breaking change more visible so 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps 3 or MAXINT could work better that -1

I have no particular preference between MAXINT and -1. I don't anticipate its going to make a difference myself but I also have no concern changing it. 3 I would not use - in the right context 3 is used as a legitimate generation number and the goal is that nobody should treat this value as if it is a generation.

Copy link
Member

@VSadov VSadov Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are not that many purposes for WhichGeneration other than to answer questions like “can this object be collected before that one?” or “will this object survive GC X?”.

Assigning immortal objects to gen -1 feels like not the most convenient/intuitive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried INT_MAX and there was only one place that I had to change (some NativeAOT test expecting frozen objects to be in Gen2). Also, GC.Collect(GC.GetGeneration.. is not breaking now as Jan highlighted.
For -1 it becomes a little bit more complicated.

So any strong opinions for keeping -1 ?

Copy link
Member

@VSadov VSadov Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think INT_MAX is the correct value. Even with unforeseen future changes, I do not see anything collecting less often than frozen.

@jkotas
Copy link
Member

jkotas commented Apr 18, 2023

We may need to file a breaking change notice for this. The code out there may not be expecting invalid generation ID to be returned from GetGeneration. For example, consider code like this https://github.com/HUKUTAXVIII/TankGame/blob/4366c4f99b2ed2667bab68c72d377c2e086f15e3/TankGame/Game.cs#L241

if (hp->IsInFrozenSegment((Object*)objectId))
{
range->generation = (COR_PRF_GC_GENERATION)-1;
range->rangeStart = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can return memory range for the given frozen segments here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can return memory range for the given frozen segments here.

I agree, but I thought that the idea was to not provide support for apis with explicit GC prefix/name/suffix for nongc objects (cc @noahfalk)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the plan of record is to avoid treating Non-GC objects as if they are on the GC heap, and that also included not treating the non-GC segments as if they are some type of special GC generation. We could smuggle the range here at the cost of blurring the lines and potentially creating some confusion. If profiler vendors said it would be specifically helpful to them I don't think it bends the rules too badly though. From the limited profiler feedback I've heard so far folks wanted to enumerate the Non-GC region boundaries directly and resolve the pointers themsevles offline.

I'm expecting @davmason to broadcast the changes we are doing to profiler vendors and then depending on their feedback we can make adjustments.

@noahfalk
Copy link
Member

We may need to file a breaking change notice for this

Yeah, its certainly a breaking change.

@noahfalk
Copy link
Member

@davmason

@EgorBo EgorBo marked this pull request as ready for review April 19, 2023 18:47
@EgorBo
Copy link
Member Author

EgorBo commented Apr 19, 2023

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@EgorBo
Copy link
Member Author

EgorBo commented Apr 19, 2023

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@EgorBo
Copy link
Member Author

EgorBo commented Apr 19, 2023

It's ready for review I guess then if we're fine with INT_MAX

@EgorBo
Copy link
Member Author

EgorBo commented Apr 19, 2023

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@EgorBo
Copy link
Member Author

EgorBo commented Apr 20, 2023

Failure is #85081

@EgorBo
Copy link
Member Author

EgorBo commented Apr 20, 2023

@davmason @noahfalk PTAL, I have more changes for the profiler but they rely on this (tests)

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked fine to me, but we should get @Maoni0 and @davmason to review the GC and profiler parts respectively. Thanks @EgorBo!

// For regions those are never in-range.
return INT32_MAX;
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Maoni0 - heads up, you probably want to look at any changes that touch the GC.

I'm fine with INT32_MAX as the value that the profiling API returns back to 3rd party code in GetGenerationBounds but I leave it to you to decide if that is also an acceptable behavior for this GC code to have.

if (gen != int.MaxValue)
throw new Exception("object is expected to be in a non-gc heaps");

GC.Collect(gen);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current docs for GC.Collect() and GC.MaxGeneration states:

Use the MaxGeneration property to determine the maximum value you can specify when calling the Collect(Int32) method that takes a generation parameter.

I don't think we need to promise anyone that GC.Collect(int.MaxValue) will work even if it happens to do so in the current implementation. Instead I think we would document something like:

When GC.GetGeneration() is called with an object that doesn't exist on the GC heap, these objects do not have any associated GC generation. GetGeneration() will return int.MaxValue in this case and this value should not be treated as a valid GC generation for any other GC API that needs a generation specified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only call GC.Collect(Int32.MaxValue) in the test if we consider it a contract and are willing to fix it if it breaks. If we decide to not view that as a contract we shouldn't have it in the test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GC.Collect(Int32.MaxValue) always worked fine. Changing it would be a breaking change.

I do not think the test coverage for it belongs to this test. A better place to add a test coverage for it would be https://github.com/dotnet/runtime/blob/main/src/libraries/System.Runtime/tests/System/GCTests.cs .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to GCTests.cs, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing it would be a breaking change

In case there was any confusion I was not proposing we change GC.Collect() behavior. I wanted to avoid declaring in our docs that GC.Collect(int.MaxValue) was supported and well-defined.

Copy link
Member

@davmason davmason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the one comment around GC.Collect it looks good to me

@EgorBo EgorBo added the breaking-change Issue or PR that represents a breaking API or functional change over a prerelease. label Apr 20, 2023
@ghost ghost added the needs-breaking-change-doc-created Breaking changes need an issue opened with https://github.com/dotnet/docs/issues/new?template=dotnet label Apr 20, 2023
@ghost
Copy link

ghost commented Apr 20, 2023

Added needs-breaking-change-doc-created label because this PR has the breaking-change label.

When you commit this breaking change:

  1. Create and link to this PR and the issue a matching issue in the dotnet/docs repo using the breaking change documentation template, then remove this needs-breaking-change-doc-created label.
  2. Ask a committer to mail the .NET Breaking Change Notification DL.

Tagging @dotnet/compat for awareness of the breaking change.

@EgorBo
Copy link
Member Author

EgorBo commented Apr 21, 2023

Leaving open for @dotnet/gc to sign off the gc changes

@Maoni0
Copy link
Member

Maoni0 commented Apr 21, 2023

I'll be looking at this this afternoon. thanks for your patience!

@Maoni0
Copy link
Member

Maoni0 commented Apr 21, 2023

I see there are quite a few comments related to whether to return -1 or INT32_MAX so I'll just respond here.

I would actually prefer to return -1 to indicate "this object is simply not managed by the GC". the rule of "if an object's generation is smaller than another object's, it will be collected before that other object" is only applicable for objects managed by the GC so you should not apply it to objects not managed by the GC. -1 is so distinct and serves this purpose well.

however, the GC interface between GC and VM uses unsigned int and changing that can change the behavior in VM. there's actually not that many places in the VM that call WhichGeneration - based on a cursory look (at the 8.0 src) it should not cause anything breaking but I'd rather not risk it (we have to worry about clrgc.dll working with previous version of coreclr. dll. it'd be trivial if we just needed it to work with 8.0+).

so return INT32_MAX is the safer choice and I think we should go with that. thanks for all the discussions!

Copy link
Member

@Maoni0 Maoni0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks so much for making these changes!

[MethodImpl(MethodImplOptions.NoInlining)]
static void AllocateNonGcHeapObjects()
{
// When this method is invoked, JIT is expected to trigger allocations for these
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not work for ReadyToRun?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, disabled the test for crossgen

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that this is correct. Did you see the test actually failing with crossgen?

If I am reading the code correctly, this test should work with crossgen just fine. The string literals referenced by R2R code must be allocated on frozen heap. Otherwise, the tiering from R2R to Tier1 would work poorly - the tiering would be stuck with non-frozen string literals.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, you're absolutely right

// non-GC objects (same for GC.GetGeneration() API) have generation = -1
if (gen.generation == (COR_PRF_GC_GENERATION)INT32_MAX)
{
if (!FAILED(hr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about calling this API to verify that it is a frozen object as well?

Copy link
Member Author

@EgorBo EgorBo Apr 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding we sort of ignore these old Frozen* profiler APIs as we re-branded Frozen as NonGC. For the same reason we introduce a new API to traverse frozen objects and ignore existing EnumModuleFrozenObjects

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore is probably fine when we were still designing, but it is really lousy from a testing perspective. It should either supposed to work or it doesn't. If it is supposed to work, and we are not testing it, it would be a test hole. If it is not supposed to work, then we should update the code to make it fails. If we wanted to depreciate old APIs in favor of new ones, we should make that clear in the documentation. Either way, something is missing.

@EgorBo
Copy link
Member Author

EgorBo commented Apr 23, 2023

@noahfalk @jkotas do I still need to file the breaking change ticket since we no longer return -1? Also, do we use the same mechanism for breaking changes in the profiler or we just notify the consumers directly?

@jkotas
Copy link
Member

jkotas commented Apr 23, 2023

do I still need to file the breaking change ticket since we no longer return -1?

Yes, this falls under "Disallowed - Increasing the range of returned values for a property, field, return or out value" from https://github.com/dotnet/runtime/blob/main/docs/coding-guidelines/breaking-change-rules.md#behavioral-changes

The process is documented at https://github.com/dotnet/runtime/blob/main/docs/project/breaking-change-process.md#process

@EgorBo
Copy link
Member Author

EgorBo commented Apr 23, 2023

do I still need to file the breaking change ticket since we no longer return -1?

Yes, this falls under "Disallowed - Increasing the range of returned values for a property, field, return or out value" from https://github.com/dotnet/runtime/blob/main/docs/coding-guidelines/breaking-change-rules.md#behavioral-changes

The process is documented at https://github.com/dotnet/runtime/blob/main/docs/project/breaking-change-process.md#process

Thanks, filed dotnet/docs#35105
Build failure is unrelated: #84101

Merging to unblock #85100

@EgorBo EgorBo merged commit 2c9e483 into dotnet:main Apr 23, 2023
@EgorBo EgorBo deleted the nongc-generation branch April 23, 2023 15:00
@ghost ghost locked as resolved and limited conversation to collaborators May 25, 2023
@ericstj ericstj added this to the 8.0.0 milestone Jun 28, 2023
@EgorBo EgorBo removed the needs-breaking-change-doc-created Breaking changes need an issue opened with https://github.com/dotnet/docs/issues/new?template=dotnet label Oct 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Diagnostics-coreclr breaking-change Issue or PR that represents a breaking API or functional change over a prerelease.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants