Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiny V2 spec #9

Open
sfPlayer1 opened this issue May 18, 2019 · 6 comments
Open

Tiny V2 spec #9

sfPlayer1 opened this issue May 18, 2019 · 6 comments

Comments

@sfPlayer1
Copy link
Collaborator

sfPlayer1 commented May 18, 2019

this is not final

Tiny V2 consists of a list of hierarchical sections. Every line starts a new section, whether it continues an existing
section is determined by the indentation level. A section's parent is always the closest preceding section indented
once less than itself. Accordingly a section ends just before the next line with the same or a lesser indentation level.

The child to parent relationships form the path to uniquely identify any element globally. For example all method and
field sections that are children of a class section represent members of the represented class.

Sections need to be unique within their level. For example a specific class may only be recorded once, a comment can't be redefined or the same parameter listed twice.

Example:

tiny	2	0	official	intermediary	named
	someProperty	someValue
	anotherProperty
c	a	class_123	pkg/SomeClass
	m	(III)V	a	method_456	someMethod
		p	1		param_0	x
		p	2		param_1	y
		p	3		param_2	z
		c	Just a method for demonstrating the format.
	f	[I	a	field_789	someField
c	b	class_234	pkg/xy/AnotherClass
	m	(Ljava/lang/String;)I	a	method_567	anotherMethod

Grammar:

<file> ::= <header> | <header> <sections>

<header> ::= 'tiny' <tab> <major-version> <tab> <minor-version> <tab> <namespace-a> <tab> <namespace-b> <extra-namespaces> <eol> <properties>
<major-version> ::= <non-negative-int>
<minor-version> ::= <non-negative-int>
<namespace-a> ::= <namespace>
<namespace-b> ::= <namespace>
<extra-namespaces> ::= '' | <tab> <namespace> <namespaces>
<namespace> ::= <safe-string>

<properties> ::= '' | <tab> <property> <eol> <properties>
<property> ::= <property-key> | <property-key> <tab> <property-value>
<property-key> ::= <safe-string>
<property-value> ::= <escaped-string>

<sections> ::= '' | <class-section> <sections>

<class-section> ::= 'c' <tab> <class-name-a> <tab> <class-name-b> <extra-ns-class-names> <eol> <class-sub-sections>
<class-name-a> ::= <class-name>
<class-name-b> ::= <optional-class-name>
<optional-class-name> ::= '' | <class-name>
<extra-ns-cls-names> ::= '' | <tab> <optional-class-name> <extra-ns-class-names>
<class-name> ::= <conf-safe-string>
<class-sub-sections> ::= '' | <method-section> <class-sub-sections> | <field-section> <class-sub-sections> | <class-comment-section> <class-sub-sections> 

<method-section> ::= <tab> 'm' <tab> <method-desc-a> <tab> <method-name-a> <tab> <method-name-b> <extra-ns-method-names> <eol> <method-sub-sections>
<method-name-a> ::= <method-name>
<method-name-b> ::= <optional-method-name>
<optional-method-name> ::= '' | <method-name>
<extra-ns-method-names> ::= '' | <tab> <optional-method-name> <extra-ns-method-names>
<method-name> ::= <conf-safe-string>
<method-desc-a> ::= <method-desc>
<method-desc> ::= <conf-safe-string>
<method-sub-sections> ::= '' | <method-parameter-section> <method-sub-sections> | <method-variable-section> <method-sub-sections> | <method-comment-section> <method-sub-sections>

<method-parameter-section> ::= <tab> <tab> 'p' <tab> <lv-index> <tab> <var-name-a> <tab> <var-name-b> <extra-ns-var-names> <eol> <method-parameter-sub-sections>
<var-name-a> ::= <optional-var-name>
<var-name-b> ::= <optional-var-name>
<optional-var-name> ::= '' | <var-name>
<extra-ns-var-names> ::= '' | <tab> <optional-var-name> <extra-ns-var-names>
<var-name> ::= <conf-safe-string>
<lv-index> ::= <non-negative-int>
<method-parameter-sub-sections> ::= '' | <var-comment-section> <method-parameter-sub-sections>

<method-variable-section> ::= <tab> <tab> 'v' <tab> <lv-index> <tab> <lv-start-offset> <tab> <optional-lvt-index> <tab> <var-name-a> <tab> <var-name-b> <extra-ns-var-names> <eol> <method-variable-sub-sections>
<lv-start-offset> ::= <non-negative-int>
<optional-lvt-index> ::= '-1' | <lvt-index>
<lvt-index> ::= <non-negative-int>
<method-variable-sub-sections> ::= '' | <var-comment-section> <method-variable-sub-sections>

<var-comment-section> ::= <tab> <tab> <tab> 'c' <tab> <comment> <eol>

<method-comment-section> ::= <tab> <tab> 'c' <tab> <comment> <eol>
<comment> ::= <escaped-string>

<field-section> ::= <tab> 'f' <tab> <field-desc-a> <tab> <field-name-a> <tab> <field-name-b> <extra-ns-field-names> <eol> <field-sub-sections>
<field-name-a> ::= <field-name>
<field-name-b> ::= <optional-field-name>
<optional-field-name> ::= '' | <field-name>
<extra-ns-field-names> ::= '' | <tab> <optional-field-name> <extra-ns-field-names>
<field-name> ::= <conf-safe-string>
<field-desc-a> ::= <field-desc>
<field-desc> ::= <conf-safe-string>
<field-sub-sections> ::= '' | <field-comment-section> <field-sub-sections>

<field-comment-section> ::= <tab> <tab> 'c' <tab> <comment> <eol>

<class-comment-section> ::= <tab> 'c' <tab> <comment> <eol>

Grammar notes:

<tab> is "\t"
<eol> is "\n" or "\r\n"
<safe-string> is a non-empty string that must not contain \, "\n", "\r", "\t" or "\0"
<conf-safe-string> is the same as <safe-string> if <properties> doesn't contain a <property> "escaped-names", otherwise it's a non-empty string further described by <escaped-string>
<escaped-string> is a string that must not contain <eol> and escapes \ to \\, "\n" to \n, "\r" to \r, "\t" to \t and "\0" to \0
<non-negative-int> is any integer from 0 to 2147483647 (2^31-1) inclusive, represented as per java.lang.Integer.toString()

<class-name> once optionally unescaped is the binary name of a class as specified in JVMS SE 8 §4.2.1, nested class identifiers are typically separated with $ (e.g. some/package/class$nested$subnested). Outer names must not be omitted for any namespace.
<method-name>/<field-name>/<var-name> once optionally unescaped is the unqualified name of a method/field/variable as specified in JVMS SE 8 §4.2.2
<method-desc> once optionally unescaped is a method descriptor as specified in JVMS SE 8 §4.3.3
<field-desc> once optionally unescaped is a field descriptor as specified in JVMS SE 8 §4.3.2

<lv-index> refers to the local variable array index of the frames having the variable, see "index" in JVMS SE 8 §4.7.13
<lv-start-offset> is at most the start of the range in which the variable has a value, but doesn't overlap with another variable with the same <lv-index>, see "start_pc" in JVMS SE 8 §4.7.13. The start offset/range for tiny is measured in instructions with a valid opcode, not bytes.
<lvt-index> is the index into the LocalVariableTable attribute's local_variable_table array, see "local_variable_table" in JVMS SE 8 §4.7.13, not to be confused with "index" referred by <lv-index>

Misc notes:

The encoding for the entire file is UTF-8. Escape sequences are limited to the types, locations and conditions mentioned above.

Indenting uses tab characters exclusively, one tab character equals one level. The amount of leading tab characters is at most 1 more than in the preceding line

Sections or properties with unknown types/keys should be skipped without generating an error.

The amount of extra namespaces defined in the header and the amount of names in every extra-ns-*-names definition have to match. They are associated by their relative position, like the mandatory name spaces a and b that are associated by the suffix, e.g. namespace-a covers class-name-a, method-name-a, method-desc-a, var-desc-a, field-name-a and field-desc-a.

Sections representing the same element must not be repeated, e.g. there can be only one top-level section for a specific class or one class-level section for a specific member.

If any variable mapping doesn't specify a lvt index, e.g. due to a missing LocalVariableTable attribute in one of the methods, the property "missing-lvt-indices" has to be added to .

Mappings without any useful names or sub-sections should be omitted.

Comments should be without their enclosing syntax elements, indentation or decoration. For example, the comment /**<eol> * A comment<eol> * on two lines.<eol> */ should be recorded as A comment<eol>on two lines..

Standard properties:

  • escaped-names: deserialize values with unescaping
  • missing-lvt-indices: expect local variable mappings without a lvt-index value
@Runemoro
Copy link

Runemoro commented Jul 2, 2019

I have a comments about the method variable section:
<method-variable-section> ::= <tab> <tab> 'v' <tab> <lv-index> <tab> <lv-start-offset> <tab> <optional-lvt-index> <tab> <var-name-a> <tab> <var-name-b> <extra-ns-var-names> <eol> <method-variable-sub-sections>

  1. You can't make both lvt-index and lv-start-offset optional, since only lv-index isn't enough to uniquely choose a local variable. It should be the lv-index that's optional, since the lvt index on its own is enough to uniquely choose the variable (what we're mapping is the name field in the lvt entry at that index).
  2. Why specify lv-index and start-offset at all? All the remapper needs to know is the local variable table index. If some other tool like Matcher needs those, it can just get them from the lvt.
  3. There shouldn't be a var-name-a field at all. With Minecraft it would all be just ☃! And this would mean it's impossible to remap a JAR whose variable names have been changed by something like Stitch's snowman remover.
  4. Maybe there should be a way to add LVT entries (for other games where the LVT is removed during obfuscation).

Here's a suggestion for what the variable section could look like:

<method-variable-section> ::= <tab> <tab> 'v' <tab> <lvt-index><tab> <var-name-b> <extra-ns-var-names> <eol> <method-variable-sub-sections>

@liach
Copy link
Contributor

liach commented Jul 2, 2019

@Runemoro I here answer some of your questions as I've worked with tinyv2 a bit already.

  1. Local variable start offset is not optional here. It must be at least 0.

  2. Note that local variable table is optional for the vm and can be removed by an obfuscator from classes. As a result, lvt index is optional.

  3. Var name a should be present, although in actual usage, we don't check the var name a when remapping parameters or local variables at all. Otherwise things may just break if we try to flip the default mappings (leftmost namespace) with some other namespace.

  4. The addition of an extra lvt is concerns for the remapper; it is not related to the tinyv2 format itself.

@Runemoro
Copy link

Runemoro commented Jul 2, 2019

Local variable start offset is not optional here. It must be at least 0.

Sorry, misread it.

Note that local variable table is optional for the vm and can be removed by an obfuscator from classes. As a result, lvt index is optional.

If that happens, it's impossible to map it. You also need the end offset and signature if you want to create a new LVT entry.

Otherwise things may just break if we try to flip the default mappings (leftmost namespace) with some other namespace.

You're talking about inverting the mappings (for example, intermediary -> named to named -> intermediary)? It should at least be optional. If the LVT is absent, there is no variable name at all.

The addition of an extra lvt is concerns for the remapper; it is not related to the tinyv2 format itself.

The format is missing the information necessary for a new LVT entry (end offset and signature).

@liach
Copy link
Contributor

liach commented Sep 27, 2019

Just a side note from Discord:

for (end user) runtime i btw increasingly like the idea of adding an attribute to tiny v2 that guarantees that it's sorted, which would in turn allow a tree api to do sparse loading+binary search

This is actually a very good point

@Col-E
Copy link

Col-E commented Feb 24, 2024

this is not final

  1. Is there any update on the state of the spec?
  2. If so, is the Mapping IO implementation fully compliant with this spec?

@modmuss50
Copy link
Member

this is not final

  1. Is there any update on the state of the spec?
  2. If so, is the Mapping IO implementation fully compliant with this spec?
  1. I believe what is on this page is correct, however https://fabricmc.net/wiki/documentation:tiny2 has been recently updated so is likely a better source of information.

  2. Yes, mapping-io is a great point of reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants