-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cDac work necessary to get the MethodTableName #104759
cDac work necessary to get the MethodTableName #104759
Conversation
Add implementations for changes to RuntimeTypeSystem contract Add SOSDacInterface call into cDac for GetMethodTableName
- Move pointer to Module class - Replace use of SBuffer abstraction with a simple counted byte memory block Add metadata details to Loader contract Add Metadata helper api for use by contracts within the cDAC (and possibly clients of cDAC too)
1. It works now!
There are a few remaining TODOs ... notably, I haven't implemented sorted lookups, and the ecma api is currently only low level token and constant access, and the fallback path for MethodTable to Name mapping but these could be fixed in a follow up PR, or if this sits for long enough, I'll fix it here. |
public EcmaMetadataSchema Schema { get; init; } | ||
|
||
private ReadOnlyMemory<byte>[] _tables; | ||
public ReadOnlySpan<ReadOnlyMemory<byte>> Tables => _tables; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use ReadOnlySequence<byte>
here? Basically, instead of having each element of the span represent the memory of a table, we could represent the underlying memory of the image as a ReadOnlySequence<byte>
that stitches together the various components of the image.
You could use DNMD's implementation of the "save to memory" function to provide the remaining byte arrays to stitch together the separate tables and heaps into a valid ECMA-335 image represented as a ReadOnlySequence<byte>
.
If we were to do this, we could (performance-permitting), refactor System.Reflection.Metadata to support this use case and remove the actual metadata reader implementation from here (and only have the "stitch into a sequence that represents metadata" code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use ReadOnlySequence here? Basically, instead of having each element of the span represent the memory of a table, we could represent the underlying memory of the image as a ReadOnlySequence that stitches together the various components of the image.
My gut reaction here would be introducing this abstraction is a bit of magic I'm not sure is a win. Do we have an example I can see that shows a benefit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's not a bad idea, or at least, it doesn't really look like one. I don't really want to pursue it yet though.
- This implementation should allow us to fill out the remaining metadata needs of the cDAC without all that much difficulty (other than the IMetadataImport export requirement, which System.Reflection.Metadata doesn't help with at all, in fact, I'm not sure the IMetadataImport api surface can be implemented on top of System.Reflection.Metadata).
- I want to explore the possibility in .NET 10 of changing how we produce triage dump metadata when it becomes CDAC driven. Currently we produce an parallel data stream that has precomputed strings, but an alternative approach that might be more scalable to our real needs could be to have triage dump metadata be fully real metadata but only the exact bytes that the reader needs (not entire tables). Having a custom metadata reader makes adding that sort of hook a fair bit easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with deferring it to a future experiment.
I think it may be worth revisiting when we explore switching the runtimes to use DNMD as the metadata library, though the "only specific bytes that are needed" model is also quite compelling/interesting to me.
Maybe we can make an issue to track possible improvements to metadata handling in dumps with the various ideas after this is merged so we don't lose them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
System.Reflection.Metadata is the managed low-level metadata reader library. I do not think we want to be growing a different one. We should refactor System.Reflection.Metadata as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas, I don't see that as a practical near term effort. To make it able to handle the cases that come up here, would require replacing all of the memory handling of that library with something different, which I can't see happening as part of this effort until we've stabilized the capabilities of what this logic will require the metadata system to be able to process. In general, I view what I've built as a short term solution, that will either last until we replace it with dnmd, or we do a major refactor of S.R.M to make it able to operate on things that aren't a pointer to a contiguous memory blob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we adding tests for this?
struct TypeHandle | ||
{ | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
|
||
internal enum CorElementType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's reference something here for this and make it all hexadecimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of spelling this out here, please define the the type and then comment where the values come from.
internal enum CorElementType
{
// Values defined in ECMA-335 - II.23.1.16 Element types used in signatures
}
@@ -69,9 +134,15 @@ internal partial struct RuntimeTypeSystem_1 | |||
internal enum WFLAGS_HIGH : uint | |||
{ | |||
Category_Mask = 0x000F0000, | |||
Category_ElementType_Mask = 0x000E0000, | |||
Category_IfArrayThenSzArray = 0x00020000, | |||
Category_Array = 0x00080000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in order.
} | ||
} | ||
``` | ||
|
||
Internally the contract depends on a `TypeDescHandle` structure that represents the address of something that implements the TypeDesc data descriptor. | ||
|
||
Internally the contract depends on the `TypeHandle` structure being capable of holding a TypeDescHandle or a MethodTableHandle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an implementation detail of CoreCLR. I don't think this what we want to express in the contract directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why this is documented as part of version 1 of the contract. By definition the specific versions are about the particular implementation details. You'll note that the version free portion of the contract documentation does not say this at all.
Also, notably, I think we should think about changing our cdacreader dll into being constructed from 2 different dlls. 1 which is the contract and targets and such., and the second which is the legacy api surface which is what we're actually exposing from cdacreader.dll.
src/coreclr/vm/ceeload.cpp
Outdated
@@ -3923,27 +3923,28 @@ void ReflectionModule::CaptureModuleMetaDataToMemory() | |||
IfFailThrow(hr); | |||
|
|||
// Operate on local data, and then persist it into the module once we know it's valid. | |||
NewHolder<SBuffer> pBuffer(new SBuffer()); | |||
int sizeInInts = (numBytes / sizeof(int32_t)) + 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int sizeInInts = (numBytes / sizeof(int32_t)) + 2; | |
int sizeInInts = (numBytes / sizeof(uint32_t)) + 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the "+ 2" for?
// The first uint32_t is the number of bytes in the saved metadata | ||
// Starting at the address of the second uint32_t value is the saved metadata itself | ||
protected: | ||
TADDR m_pDynamicMetadata; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we type this as:
struct DynamicMetadata
{
uint32_t Length;
uint32_t Data[0];
};
The above will be documented on the native side and the native side can then document the layout?
if (address == 0) | ||
return default; | ||
|
||
if (((ulong)address & 2) == (ulong)2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (((ulong)address & 2) == (ulong)2) | |
if (((ulong)address & 2) != 0) |
} | ||
|
||
public TargetPointer MethodTable { get; init; } | ||
public ushort NumMethods { get; init; } | ||
public uint CorTypeAttr { get; init; } | ||
public byte InternalCorElementType { get; init; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please doc what makes this "Internal", that is generally an important distinction.
GenericParam = 0x2a, | ||
MethodSpec = 0x2b, | ||
GenericParamConstraint = 0x2c, | ||
Count = 0x2d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Count = 0x2d | |
Count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look at the EcmaMetadataReader or the TypeNameBuilder yet. Initial comments to answer questions and raise some concerns about MethodTableArray and TypeHandleArray.
| Data Descriptor Name | | ||
| --- | | ||
| `EEClass` | | ||
| `ArraayClass` | | ||
| `TypeDesc` | | ||
| `ParamTypeDesc` | | ||
| `TypeVarTypeDesc` | | ||
| `FnPtrTypeDesc` | | ||
| `GenericsDictInfo` | | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I'd like to update all the contracts to have tables like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been just doing a basic list - the table does look nicer.
runtime/docs/design/datacontracts/Exception.md
Lines 26 to 28 in 856a0a6
Data descriptors used: | |
- `ExceptionInfo` | |
- `Exception` |
We probably want to put globals in a table too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my second pass at this today, I went and did a bit where I put all the types, and all the fields, and comments for the meaning of the fields. It looks really nice, and you guys will see it next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other thing we should probably do is put all the magic constants, like enum values in a similar table. Then we can add some build time validation that a various constants have the expected values for the particular contract version we're working with (This could probably even be _DEBUG only validation if its easier to do.). I don't think we need a data descriptor for those values... just enough validation that we know to update the contract version when magic numbers change. I'm not going to do that in this change, but it seems like a good idea for us to implement soonish.
{ | ||
// validate that address points to something that looks like a TypeHandle. | ||
|
||
if (address & 2 == 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like us to use an enum for this kind of pointer bit-stealing. See EEClassOrCanonMTBits
int numberOfGenericArgs = _target.ProcessedData.GetOrAdd<GenericsDictInfo>(genericsDictInfo).NumTypeArgs; | ||
MethodTableArray instantiation = _target.ProcessedData.GetOrAdd<(TargetPointer, int), MethodTableArray> | ||
((dictionaryPointer, numberOfGenericArgs)); | ||
|
||
return instantiation.Types.AsSpan(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does the key for the data include the number of generic args? can the same dictionaryPointer
be viewed with a different number of generic args?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because just the pointer to the start of the data doesn't quite encode enough to identify the number of generic args. I'm going to restructure this so that instead of this being a general purpose TypeHandleArray
... that it becomes a TypeInstantiation
type, that type is defined in the contract implementation, and the pointer that makes identifies it is the MethodTable
pointer. That should make this simpler, remove the extra key concept, remove the concern about Data contracts knowing about contracts, and make it so that repeated calls to GetInstantiation
api are just the cost of doing data lookup, which should help us put our new design on a path to actually be faster than the old model.
|
||
namespace Microsoft.Diagnostics.DataContractReader.Data; | ||
|
||
internal sealed class MethodTableArray : IData<MethodTableArray, (TargetPointer ptr, int size)> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't feel like it belongs in Data
. I've typically viewed Data
as straightforward data-descriptor extractor. Something we could potentially source-generate in the future. This MehtodTableArray
feels more like it belongs to the RuntimeTypeSystem contract proper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a general structure for post-processed data produced by the contracts. @lambdageek you have already done this with the _methodTables array in the RuntimeTypeSystem
contract, and I think that is a better place to put the processing, but I don't think its a particulaly good way to structure managing the memory of the data.
As I was writing up the DacStreams
contract for the pointer to name mapping, I think I found a better approach.
- Use the
ProcessedData
/IData
interfaces to hold the data. I dislike theRuntimeTypeSystem
approach of having some state in the actual contract implementation. I want ALL state for the contracts to be held inProcessedData
, or on other structures associated with the Target (such as theMetadata
). - Put the data structure into the contract implementation.
This both lets us have a single part of the system to do all memory invalidation, lets us get the caching behavior to avoid reprocessing the world all of the time, and places all of the logic that understands runtime data structures into the contract.
- Remove TypeHandleArray and MethodTableArray in favor of contract specific logic
…ime into cdac-methodtable-name
{ | ||
uint32_t Size; | ||
BYTE Data[0]; | ||
template<typename T> friend struct ::cdac_offsets; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: technically this is unnecessary (except, perhaps, as documentation), all members of a struct are public, so datadescriptor.h
can just use offsetof
directly
I expected to see updates to the tests due to the (VS Code's test extension seems to detect and run the testsuite correctly (at least if you build Debug libraries) if you want to run them locally if you open the test project) |
Co-Authored-By: David Wrighton <davidwr@microsoft.com>
@davidwrighton here's a commit that has the minimal set of changes to make the existing MethodTableTests tests pass again ( |
- Add test suite for the DacStreams contract
Add new details to the RuntimeTypeSystem and Loader contracts as needed to load metadata and examine and identify all of the kinds of type that the CoreCLR type system can represent
Add a type name generator based on a line by line copy from the CoreCLR codebase
Add an Ecma335 metadata parser which is pure safe managed code, and is capable of loading metadata that is not structured as a single array. This implementation is heavily based on the dnmd implementation.
Provide impelementations and documentation for all of the new contracts except for the RW metadata one. (Its rarely needed, so we can add it later)
Enhance the target infrastructure to better handle various forms of arrays, and contracts which do math based on target pointer sizes.
Contributes to #99302