-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] ECMA-335 specification for "reflection-notation" serialized type names #4416
Comments
Also @AnthonyDGreen as it was mentioned he's looked into this before. |
In Roslyn we considered this as a readable persistable text format for SymbolId that could be used to specify any symbol. But the reflection/ILDASM format as I recall doesn't have a format for properties and events. Though I think it would be trivial to extend the format to support them. |
@davkean Re Q2: Reflection inconsistencies should be treated as quirks, imo. The spec should be based on writers, not readers. Writers are in this case the compilers. If there is something that the reader can read but it's never produced by a writer we care about (and thus invalid if we base the spec on the writers we care about) then that's just a quirk of the reader. |
+1 to @tmat re Q2. Some first thoughts...
|
|
Re (1) : I mean the grammar notation in Ecma 335 where it describes the syntax of the IL assembly language. |
Further clarification, I meant my (1) not Q1 in proposal, i.e. use the same notation as how ILAsm syntax is specified. Also, I believe 'SerString' in the spec refers to how any such string (including string attribute values, not just type attribute values) is encoded as bytes. The spec doesn't seem to refer to give the type representation a name. It just has underspecified description in the second paragraph. Finally, re: the "within generic type arguments" confusion, if this is just about the handling of '[' in assembly name components, I suggest we either ban it globally or require it to be escaped globally. |
@nguerrera All good feedback, have simplified the generic type arguments assembly name confusion; always now treat ']' as a delimiter. Clarified difference between SerString and the canonical form of the type name, misread the spec. Will look at doing the same notation as ILAsm. |
Some quick observations about the captured grammar rules:
|
This is invalid syntax (error C2059: syntax error : 'typeid'). But if we workaround like this:
The resulting IL is:
So the C++/CLI compiler does not preserve modifiers or C++-specific types when serializing attribute typeid arguments. It just stores the closest managed type that corresponds to the passed type. |
Thanks for checking it @MSDN-WhiteKnight! It corresponds to what I would expect to see. Modifiers pretty much don't matter outside signature matching. They e.g. don't impact the LDTOKEN instruction either. Here is one of our tests checking that LDTOKEN of a type and LDTOKEN of a type with a modifier in it produce the same handle: runtime/src/tests/reflection/ldtoken/modifiers.il Lines 12 to 19 in ead035b
Adding representation of modifiers to the SerString format wouldn't likely result in meaningful improvement (I would expect the reflection stack to return types without the modifiers to match what LDTOKEN does - because both of these essentially map to the C# |
Updates
Rationale
This is a proposal that attempts to fully specify reflection-notation serialized types for inclusion in ECMA-335 (referred onwards as "the CLI").
In metadata, when a type is persisted as the value of a fixed or named argument, such as in the following code block, it is serialized in a SerString in its canonical form.
SerString and the canonical form are documented like so (see _II.23.3 Custom attributes_)
The last paragraph is under specified and does not provide enough information for metadata readers or other inspectors to consume and interpret this canonical form.
The documentation for Type.GetType also has an attempt to document a similar format, but it also falls short. Also while nothing in the CLI or on MSDN indicate a relationship between canonical name and the type name you pass to Reflection's Type.GetType, they are clearly related.
Based on this I've attempted to write up the grammar that makes up these formats into a single format. Note, I've used a custom form of BNF (Backus-Naur Form), if that puts an unpleasant taste in your mouth, I'm sorry in advance. :)
My hope is first to work towards an agreement on the format, and then move onto figuring out how to actually represent and document this within the CLI itself (that's where I hope @CarolEidt comes in).
Proposed Format
Format of an full type name or assembly-qualified name in "reflection-notation"
Notes
I've written an implementation of a decoder of the above format for inclusion as part of System.Reflection.Metadata, 1.2.
Questions
Reflection has lots of corner case issues and inconsistences around on how it handles certain things, such as trailing chars and unclosed quotes. What should we do about them? Should we mimic this in the spec? Or should we just spec the format to be a little tighter and treat these as inconsistences as a quirk of Type.GetType?We've decided not to mimic these quirks. Writers will be held to the above format, readers can choose to allow more.
The text was updated successfully, but these errors were encountered: